How to read very big pdf file

Hi team,

can you provide some idea how to read very big pdf file? I need to read file and check if inside exist some keywords. For example, I have pdf reports with 231 pages and 602 pages…“Read PDF with OCR” or “Read PDF text” is working good if reports has 40 pages. What to do with big file? How to optimize the processing time? Maybe somebody already has an example?

Thank you for the idea and support!

How about you read the Pdf in ranges (1-50,50-100,100-150…550-602) and merge everything in the end ?And use is your Read PDF Text in an isolated workflow.

This should give the page count and you can loop

Page Count

1 Like

This is to split the pdf pages into ranges.

pdfread_range.xaml (8.8 KB)

1 Like

If it takes too long, it might be beneficial to cut the time down by using select all with the pdf open, then use copy to store the text from the Clipboard to a variable or paste it into another application. I successfully have used the copy method where I store it to a string variable and do a double check that I got it all with string manipulation, and the pdf will have usually around 3000 pages which takes about 5mins to copy.

2 Likes

vvaidya, would you be so kind to explain who to write read activity for pdf to read each range?

I used below file as example to Read pdf in Ranges of 10 pages. Similarly you can do for verylarge PDF mentioning your own range. If you have to OCR, then try to use isolated invoked workflow.

pdfread_range.xaml (12.9 KB)
uip.pdf (3.0 MB)

2 Likes

Thank you very much!