Reducing Data from Scanned Document with Next.js

Eugene Musebe

Introduction

It is estimated that 66 percent of workers claim their firms have little or no explicit data protection policies or technology in place, despite expanding data privacy requirements for how information is gathered, saved, used, and shared. As a result, most organizations continue to use non-standard techniques to obfuscating content, reliance on manual methods to anonymise sensitive data and thereby nullifying the chance for analysis and insights. This article focuses on achieving one of such methods using Nextjs library. We will focus on data reduction. Where a user will upload a scanned file and redact any words they wish inside the file.

Codesandbox

The final version of this project can be viewed on Codesandbox.

You can find the full source code on my Github repo.

Prerequisites

Entry-level javascript and React/Nextjs knowledge.

Setting Up the Sample Project

Create a new nextjs app using npx create-next-app imgredact and head to your terminal: cd imgredact

We will also include online storage services to store the processed file whenever necessary. We will use Cloudinary to achieve this by including it in the Nextjs server side backend.

Use this link to create a new account and or log into it. You should see a dashboard once logged in that will contain environment variables necessary for our project backend integration.

In your project dependencies `npm install Cloudinary

Create a new file named .env.local in your root directory and paste the following

1".env.local"
2
3
4CLOUDINARY_CLOUD_NAME =
5
6CLOUDINARY_API_KEY =
7
8CLOUDINARY_API_SECRET =

Fill the blanks with your environment variables from the Cloudinary dashboard and restart your project using: npm run dev.

Create a new directory named pages/api/upload.js and begin by configuring the environment keys and libraries.

1var cloudinary = require("cloudinary").v2;
2
3cloudinary.config({
4 cloud_name: process.env.CLOUDINARY_NAME,
5 api_key: process.env.CLOUDINARY_API_KEY,
6 api_secret: process.env.CLOUDINARY_API_SECRET,
7});

Use the Nextjs backend handler function to execute the post request, upload the media file to Cloudinary and decode the texts inside it. The texts will be sent back to the front end as a response to be redacted.

1export default async function handler(req, res) {
2 if (req.method === "POST") {
3 // console.log("bakend begins...");
4 // Process a POST request
5 let response = "";
6 try {
7 let fileStr = req.body.data;
8 console.log("backend received");
9
10 await cloudinary.uploader.upload(
11 fileStr,
12 { ocr: "adv_ocr" },
13 function (error, result) {
14 if (error) {
15 console.log(error);
16 }
17
18 response = result;
19 console.log(
20 response.info.ocr.adv_ocr.data[0].textAnnotations[0].description
21 );
22 }
23 );
24 } catch (error) {
25 console.log("error", error);
26 res.status(500).json({ error: "Something wrong" });
27 }
28
29 res
30 .status(200)
31 .json(response.info.ocr.adv_ocr.data[0].textAnnotations[0].description);
32 }
33}

The code above concludes our backend. Let us create the front end. Then

Include html2canvas in your dependencies. We will use it as we move on: npm install html2canvas

In your pages/index directory, include the following imports:

1"pages/index"
2
3
4import html2canvas from "html2canvas";
5import React, { useState, createRef } from "react";
6
7const HTTP_SUCCESS = 200;

"pages/index"

Notice the variable HTTP_SUCCESS. We wil use iit to determine our successfull API responses.

Declare the following react hooks:

1"pages/index"
2
3
4 const [extractText, setExtractText] = useState("");
5 const [selectedFile, setSelectedFile] = useState(null);
6 const [cloudinaryResponse, setCloudinaryResponse] = useState("");
7 const [image, setImage] = useState();
8 const [output, setOutput] = useState();
9 const [result, setResult] = useState(false);
10 const capture = createRef();

Before we continue, fill the return statement with the following. You can get the css files in the Github repo

1"pages/index"
2
3
4return (
5 <div className="container">
6 <h3>Reduce Data from Scanned documents with nextjs</h3>
7 <div className="row">
8 <div className="column">
9 <input
10 type="file"
11 accept="image/png, image/gif, image/jpeg"
12 onChange={(e) => setSelectedFile(e.target.files[0])}
13 />
14
15 {selectedFile && (
16 <div className="section">
17 <input
18 type="text"
19 onChange={(e) => setExtractText(e.target.value)}
20 value={extractText}
21 className="input"
22 />{' '}
23
24 <button onClick={reduceText}>Reduce text</button>
25 <br /><br />
26 <img src={image} /><br /><br />
27 </div>
28 )}
29 </div>
30 </div>
31 {result &&
32 <div className="row">
33 <div className="column">
34 {output && <img src={output} />}
35 {cloudinaryResponse && <div style={{ color: "black" }} ref={capture}>{cloudinaryResponse}</div>}
36 {cloudinaryResponse ? <button onClick={showOutput}>Show output</button> : <img src="https://res.cloudinary.com/dogjmmett/image/upload/v1652789616/loading_fjpxay.gif" alt="this slowpoke moves" width="250" />}
37 </div>
38 </div>
39 }
40 </div>
41);

When a user first experiences the UI. They will be required to select a scanned document from their local repository. Create a function reduceText that will use a file reader to convert the user's selected media file to base64 and save the encoded image format to the image state hook as well as pass it to the uploadHandler function.

1const readFile = (file) => {
2 console.log("readFile()=>", file);
3 return new Promise(function (resolve, reject) {
4 let fr = new FileReader();
5
6 fr.onload = function () {
7 resolve(fr.result);
8 };
9
10 fr.onerror = function () {
11 reject(fr);
12 };
13
14 fr.readAsDataURL(file);
15 });
16 };
17
18
19 const reduceText = async () => {
20 if (selectedFile && extractText) {
21 await readFile(selectedFile).then((encoded_file) => {
22 setImage(encoded_file);
23 uploadHandler(encoded_file);
24 });
25 }
26 };

The uploadHandler function will upload the encoded fill to Cloudinary and use the HTTP_SUCCESS variable created earlier to determine a successful response to receive the image file's encoded texts from the backend. The user will be allowed to specify which words to redact so that they all be replaced with the string XXX using replaceAll() method. The texts will be assigned to the cloudinaryResponse state hook and will also be visible to the user once this is done.

1const uploadHandler = (base64) => {
2 console.log("uploading to backend...");
3 setResult(true)
4 try {
5 fetch("/api/upload", {
6 method: "POST",
7 body: JSON.stringify({ data: base64 }),
8 headers: { "Content-Type": "application/json" },
9 }).then((response) => {
10 console.log("successful session", response.status);
11 if (response.status === HTTP_SUCCESS) {
12 response.text().then((result) => {
13 setCloudinaryResponse(result.replaceAll(extractText, "XXX"));
14 });
15 }
16 });
17 } catch (error) {
18 console.error(error);
19 }
20};

We will finally have a function showOutput to capture the processed file and show it to the user.

1const showOutput = async () => {
2 let img;
3
4 await html2canvas(capture.current, {
5 scale: 1,
6 logging: true,
7 }).then((canvas) => {
8 console.log(canvas);
9 img = canvas.toDataURL();
10 });
11 setOutput(img);
12};

A sample of the UI is as below:

That's it! Ensure to go through the article to enjoy the experience.

Eugene Musebe

Software Developer

I’m a full-stack software developer, content creator, and tech community builder based in Nairobi, Kenya. I am addicted to learning new technologies and loves working with like-minded people.