Getting hidden data from pdf using DU

ankur.kaushik · April 3, 2022, 9:09am

here i got an issue ,this is validation station, and i am trying to extract CIN number but its different in PDF and in validation station.
what kind of pdf it is or why its showing hidden extra text.

ridvanucok · April 3, 2022, 6:13pm

Hi @ankur.kaushik ,
Have you tried extracting data from PDF with Regex?

suraj.setty · April 4, 2022, 3:44am

Hi @ankur.kaushik

Could you please help to know ,What is the extractor being used ?

Thanks.

ushu · April 4, 2022, 6:02am

@ankur.kaushik I guess its not extracting hidden data. Reasons could be

Matching the other/similar data present in the same page or in the different page
Due to the pdf less quality, its reading the data in the incorrect format

Question - Which extractor you are using ?

If you were using Intelligent Form Extractor or Form Extractor, why don’t you try with Anchors
If you were using ML extractor, more retraining required to extract the data properly

ankur.kaushik · April 4, 2022, 10:49am

its scanned image pdf

ankur.kaushik · April 4, 2022, 10:49am

form extractor

ankur.kaushik · April 4, 2022, 10:50am

okay i will try it.

suraj.setty · April 4, 2022, 11:51am

Hi @ankur.kaushik

Try Using Intelligent Form Extractor or Machine Learning Extractor and check the results.

Thanks.

system · April 7, 2022, 12:36pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Document understand field extraction issue Activities question , document_understanding	2	862	September 2, 2021
Not extracting values from pdf in validation station Studio studio , question , designer_canvas	1	801	April 7, 2021
Data Extraction, validationa and verification using Document Understanding Studio uiautomation	6	1587	April 14, 2022
Data extraction using Taxonomy Studio studio , question , activities_panel	9	768	July 23, 2022
Document Understanding with ML Extractor facing issue Automation Starter studio , pdf-extraction , machine-learning-extractor	5	1033	June 20, 2022

Most Active Users - Yesterday
ashokkarale
manasrlenka25
More details...

Getting hidden data from pdf using DU

Related topics