Image type PDF

abivanth.r · February 28, 2024, 10:27am

HI,
I need to extract data form an image type pfd. The OCR activities are not working and getting errors. Please help me with that…
Thanks

rlgandu · February 28, 2024, 10:33am

@abivanth.r

Use Read Pdf with Ocr change the scales and image Dpi and try different Ocr like Teseract,Omnipage

Parvathy · February 28, 2024, 10:41am

Hi @abivanth.r

Use Read PDF with OCR activity. By Default the scaling will be 2. You can enter scale from 0 to 5. and you can change the Profile too. Usually for Scanned PDF’s keeping Profile as None and scaling 2 should work. This is not same at all cases. So start the scale from 0 and increase it by 0.5. Keep Profile as None or scan. Check in both the ways which the extracting the data accurately.

If Tesseract OCR Engine doesn’t work you can go with OmniPage OCR engine. For this you need to download the dependency UiPath.OmniPage.Activities. In this also you will have Profile and scan. keep that accordingly and check. larger values for Scaling in OmniPage OCR engine fails.

Hope you understand!!

abivanth.r · February 28, 2024, 10:42am

what do the profile property and scale property do? I mean i can read the pdf now. Though it is portrait.The data is getting wrong.

rlgandu · February 28, 2024, 10:46am

@abivanth.r

Scaling values start from 1 you can increment it 1.2,1.3 …upto you get the data properly.Change the image DPI it 96,150,256

it enhance your data extraction correctly

abivanth.r · February 28, 2024, 10:48am

Hi,
The pdf has chinese or language related to that. but i am getting some english data only… How can i fix that

Parvathy · February 28, 2024, 10:50am

Hi @abivanth.r

Check out this docs

Regards

rlgandu · February 28, 2024, 10:55am

@abivanth.r

Use Omnipage ocr it extract the Pdf Data correctly as compared to other OCR’s it can extract Chinese and English Data by slightly changing the Image DPI and Scales.

install UiPath.omnipage.activity package from manage packages

abivanth.r · February 28, 2024, 11:18am

Thanks man worked. but the data is still inaccurate.

rlgandu · February 28, 2024, 11:21am

@abivanth.r

Try changing different Scalings until you get the data properly,But remember ocr activities does not extract the data 100% accurare you to keep the track that which scaling gives you the best data extraction you have to stick on that.
I hope it helps you

system · March 2, 2024, 11:21am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Not able to extract the proper data from PDF Studio studio , question , activities_panel	4	129	May 21, 2024
Unable to extract table which is in image format in pdf Studio studio , question , activities_panel	6	188	April 17, 2024
PDF Image Data Extraction Studio pdf , activities , question	2	1736	October 15, 2020
Extract data from scanned PDFs Help	7	867	August 31, 2020
Read PDF With OCR Activity Studio	5	223	March 27, 2024

Image type PDF

Related topics