How to extract data from scanned and unclear pdf

I want to get account number(2,288,000) and No., but the scanned pdf is unclear. The scanned pdf is below:

I do some activities in UiPath studio:

the result output data is below:

what is bothering is that the output data is not correct obviously.
how could I get the specific data I want? the method I can have at heart is below:

Read PDF With OCR → txt → extract data with Regex Expression. The problem is that when I read pdf with OCR to txt, the results are not correctly.

I really need your help.

test.txt (1.9 KB)


Try with using document understanding models and train the invoice models or other models which fit your document


  1. Is your format fixed or dynamic ?, If it fixed we can try with OCR with fixed(X & Y axis)
  2. If you have UiPath DU you can try that it have a pre defined model for Invoice China to handle Chinese character

Let me know if you have any questions