How to extract data from scanned and unclear pdf

MiaZhou · September 5, 2023, 5:34am

I want to get account number(2,288,000) and No., but the scanned pdf is unclear. The scanned pdf is below:

I do some activities in UiPath studio:

the result output data is below:

what is bothering is that the output data is not correct obviously.
how could I get the specific data I want? the method I can have at heart is below:

Read PDF With OCR → txt → extract data with Regex Expression. The problem is that when I read pdf with OCR to txt, the results are not correctly.

I really need your help.

test.txt (1.9 KB)

Anil_G · September 5, 2023, 5:42am

@MiaZhou

Try with using document understanding models and train the invoice models or other models which fit your document

Cheers

muthuerd · September 5, 2023, 9:37am

@MiaZhou

Is your format fixed or dynamic ?, If it fixed we can try with OCR with fixed(X & Y axis)
If you have UiPath DU you can try that it have a pre defined model for Invoice China to handle Chinese character

Let me know if you have any questions

Topic		Replies	Views
Unable to extract specific data from scanned pdf Help pdf , activities , question	6	1097	January 24, 2020
PDF Extracxation Studio studio , question , project_panel	2	631	October 20, 2022
Extract PDF Invoice Studio	1	780	April 27, 2020
I need to extract all the details from invoices pdf and line item describtion quantity and all the fields and i need to do this for all pdf files in the folder Studio studio , question , activities_panel	23	3124	June 30, 2021
Extract scanned PDF to excel Studio	5	5023	August 16, 2020

Most Active Users - Yesterday
Anil_G
ashokkarale
V_Roboto_V
Yoichi
sharazkm32
eliamma.joseph
Vaishnavi_RP
sullivanne
nikhil.chandre
Alan_Riquelmes
More details...

How to extract data from scanned and unclear pdf

Related topics