How to extract data from scanned and unclear pdf

I want to get account number(2,288,000) and No., but the scanned pdf is unclear. The scanned pdf is below:

I do some activities in UiPath studio:

the result output data is below:

what is bothering is that the output data is not correct obviously.
how could I get the specific data I want? the method I can have at heart is below:

Read PDF With OCR → txt → extract data with Regex Expression. The problem is that when I read pdf with OCR to txt, the results are not correctly.

I really need your help.

test.txt (1.9 KB)


Try with using document understanding models and train the invoice models or other models which fit your document


1 Like


  1. Is your format fixed or dynamic ?, If it fixed we can try with OCR with fixed(X & Y axis)
  2. If you have UiPath DU you can try that it have a pre defined model for Invoice China to handle Chinese character

Let me know if you have any questions