Image type PDF

HI,
I need to extract data form an image type pfd. The OCR activities are not working and getting errors. Please help me with that…
Thanks

@abivanth.r

Use Read Pdf with Ocr change the scales and image Dpi and try different Ocr like Teseract,Omnipage

Hi @abivanth.r

Use Read PDF with OCR activity. By Default the scaling will be 2. You can enter scale from 0 to 5. and you can change the Profile too. Usually for Scanned PDF’s keeping Profile as None and scaling 2 should work. This is not same at all cases. So start the scale from 0 and increase it by 0.5. Keep Profile as None or scan. Check in both the ways which the extracting the data accurately.

If Tesseract OCR Engine doesn’t work you can go with OmniPage OCR engine. For this you need to download the dependency UiPath.OmniPage.Activities. In this also you will have Profile and scan. keep that accordingly and check. larger values for Scaling in OmniPage OCR engine fails.

Hope you understand!!

what do the profile property and scale property do? I mean i can read the pdf now. Though it is portrait.The data is getting wrong.

@abivanth.r

Scaling values start from 1 you can increment it 1.2,1.3 …upto you get the data properly.Change the image DPI it 96,150,256

it enhance your data extraction correctly

Hi,
The pdf has chinese or language related to that. but i am getting some english data only… How can i fix that

Hi @abivanth.r

Check out this docs

Regards

@abivanth.r

Use Omnipage ocr it extract the Pdf Data correctly as compared to other OCR’s it can extract Chinese and English Data by slightly changing the Image DPI and Scales.

install UiPath.omnipage.activity package from manage packages

Thanks man worked. but the data is still inaccurate.

@abivanth.r

Try changing different Scalings until you get the data properly,But remember ocr activities does not extract the data 100% accurare you to keep the track that which scaling gives you the best data extraction you have to stick on that.
I hope it helps you

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.