Document Understanding - How to read colors from a (scanned) table

T0Bi · February 18, 2021, 2:45pm

Hey,

this might be an unusual problem but maybe you’re able to help me / point me in the right direction.

We’re working a lot with document understanding and have (re)trained a fair amount of models to fit our needs.
But now we’ve got a new kind of document and so far I have no idea how to process it, without writing my own python model.

Imagine people use printed plans/tables and the “data” is simply the color of the field. They’re coloring the fields either red, green or leave it blank.

Here’s an example how this might look like:

Just imagine it being a scanned document, not a screenshot.

How/Where would I start processing this document.

I basically need the output to be a table (datatable/excel/whatever) which I can use for further processing.

Questions:

Is this even remotely possible with UiPath?
If I were to write my own model/logic, where would I start, do you know of any libraries doing anything similar?

So far I find it pretty hard to even google this problem…

Thanks in advance,
T0Bi

moenk · February 18, 2021, 5:16pm

Can you rely on the pixel positions of the coloured boxes? Then this may work quite easy. But if postions may changes it is not so easy any more. Think where the data came from and ask there. But in the end this makes no stable and beautiful process.
Now to my approach: I’d search and replace in the graphics for the coloured boxes and replace them with words in the graphic. Then run the OCR to get the content as usual and remove spaces.

T0Bi · February 19, 2021, 8:23am

As the documents are scanned, nothing’s reliable unfortunately.

That was one of my ideas as well. Do you know of any library/framerwork which can search/replace colored boxes?

Because it’s not as simple as it sounds.

moenk · February 19, 2021, 8:49am

No, you will have to do the code by yourself, and it depends on the quality of input data how reliable this works.

Topic		Replies	Views
About the Document Understanding category Document Understanding	5	3623	November 11, 2020
Document Understandng Studio studio , question	4	1107	April 15, 2021
Document Understanding - Ubiquity Technology's experience with OCR, AI, NLP and ML for document data analysis and extraction Document Understanding document_understanding	1	1368	September 11, 2020
What is the best way to handle Document Processing and Document Understanding? Activities activities , question , document_understanding	1	643	November 30, 2022
Facing issues with document understanding- beginner Document Understanding	6	2123	October 14, 2022

Most Active Users - Yesterday
mkankatala
lrtetala
vrdabberu
rlgandu
zell12
ashokkarale
Vincent_Nuestro
sandyarpa767
jnfantasy4
Laszlo_Kajan
More details...

Document Understanding - How to read colors from a (scanned) table

Related Topics