How to read very big pdf file

Aggi · January 18, 2018, 9:40am

Hi team,

can you provide some idea how to read very big pdf file? I need to read file and check if inside exist some keywords. For example, I have pdf reports with 231 pages and 602 pages…“Read PDF with OCR” or “Read PDF text” is working good if reports has 40 pages. What to do with big file? How to optimize the processing time? Maybe somebody already has an example?

Thank you for the idea and support!

vvaidya · January 19, 2018, 2:40pm

How about you read the Pdf in ranges (1-50,50-100,100-150…550-602) and merge everything in the end ?And use is your Read PDF Text in an isolated workflow.

This should give the page count and you can loop

Page Count

vvaidya · January 19, 2018, 3:10pm

This is to split the pdf pages into ranges.

pdfread_range.xaml (8.8 KB)

ClaytonM · January 19, 2018, 3:52pm

If it takes too long, it might be beneficial to cut the time down by using select all with the pdf open, then use copy to store the text from the Clipboard to a variable or paste it into another application. I successfully have used the copy method where I store it to a string variable and do a double check that I got it all with string manipulation, and the pdf will have usually around 3000 pages which takes about 5mins to copy.

Aggi · January 23, 2018, 1:20pm

vvaidya, would you be so kind to explain who to write read activity for pdf to read each range?

vvaidya · January 24, 2018, 9:33pm

I used below file as example to Read pdf in Ranges of 10 pages. Similarly you can do for verylarge PDF mentioning your own range. If you have to OCR, then try to use isolated invoked workflow.

pdfread_range.xaml (12.9 KB)
uip.pdf (3.0 MB)

Aggi · January 25, 2018, 8:47am

Thank you very much!

Topic		Replies	Views
Is there a page limit for readPdf Activity? Help	2	1707	February 10, 2019
PDF file with 2000 pages Help	3	1395	December 15, 2018
Getting text from PDF file with muliple pages..using looping Help pdf , activities , question	7	1527	November 25, 2019
How to find a keyword in larger pdf files Activities pdf , activities , question	11	1444	August 10, 2021
Read PDF with OCR only reads couple of pages or last page Activities pdf , activities , question	6	1983	July 29, 2021

Most Active Users - Yesterday
ashokkarale
MD_Farhan1
Ajay_Mishra
postwick
Dheerendra_vishwakarma
Anil_G
chandreshsinh.jadeja
Gautham_Pattabiraman
vrdabberu
aravindbalineni123
More details...

How to read very big pdf file

Related Topics