How to perform pdf automation with images

BNK · December 26, 2023, 7:21am

I want to extract some data from pdf. but that pdf contains some images. So, I need to extract few information’s from images(PDF). I tried with read pdf ocr(Tesseract). It is not extracting properly.

Anil_G · December 26, 2023, 7:23am

@BNK

Can you share some sample and what you want to extract from it

cheers

vrdabberu · December 26, 2023, 7:26am

Hi @BNK

Try changing the scale and profile for the tesseract ocr and the scale starts from 0 and can go till 5. In read pdf with ocr try changing the ImageDpi

Regards

BNK · December 26, 2023, 7:30am

RPA_autocadd.pdf (219.1 KB)

Sample PDF attached

Anil_G · December 26, 2023, 7:31am

@BNK

The resolution looks really bad

also what you want to extract from it?

it is total unstructured view difficult if the image changes always

cheers

BNK · December 26, 2023, 8:58am

108416-crude-oil_schematic.pdf (251.5 KB)
Hi Anil,

           I added one more sample also. I need to extract all the information from the pdf.

Topic		Replies	Views
Extracting the data from image based pdf Help pdf , ocr , activities	4	961	March 20, 2020
Screenshot pdf data extraction Studio studio	16	563	January 18, 2024
Reading pdf data Help pdf , activities , question	4	1201	November 19, 2019
Extracting Structured image from a PDF Help pdf , ocr , activities , question	0	809	December 17, 2019
Read PDF With OCR Activity Studio	5	226	March 27, 2024

How to perform pdf automation with images

Related topics