Getting hidden data from pdf using DU

here i got an issue ,this is validation station, and i am trying to extract CIN number but its different in PDF and in validation station.
what kind of pdf it is or why its showing hidden extra text.

Hi @ankur.kaushik ,
Have you tried extracting data from PDF with Regex?

Hi @ankur.kaushik

Could you please help to know ,What is the extractor being used ?

Thanks.

@ankur.kaushik I guess its not extracting hidden data. Reasons could be

  • Matching the other/similar data present in the same page or in the different page
  • Due to the pdf less quality, its reading the data in the incorrect format

Question - Which extractor you are using ?

  • If you were using Intelligent Form Extractor or Form Extractor, why don’t you try with Anchors

  • If you were using ML extractor, more retraining required to extract the data properly

its scanned image pdf

form extractor

okay i will try it.

Hi @ankur.kaushik

Try Using Intelligent Form Extractor or Machine Learning Extractor and check the results.

Thanks.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.