Extract number from pdf


#1

Hi all,
i have to extract a specific number from PDFs that all have the same layout.
However the PDFs have no specific UI elements. So what is the best option to extract the PDF?

Thanks
Tobi


#2

Hey @tobschroer

You can use get OCR text or Screen scraping :slight_smile:

Refer to these links:


#3

Thanks @Rishabh_Lakhera.
I have tried it with the get text Action. But my message box is always empty.


#4

Try Screen Scraping and play around with the ocr,fulltext and native options!
Always works for me :slight_smile:


#5

@Rishabh_Lakhera . It´s working right now. But only once. When the PDF isn´t open in the background a workflow exception appears and if i close the PDF, open it a second time and then start the workflow i get totally other Outputs in my messsage box. Do you have any idea?
Thanks


#6

If the PDF layout is fixed, We can use get PDF text to get all PDF content into a string variable. Can try using string manipulation for extracting the required Number.


#7

Hi @tobschroer,
Perhaps try to change the scale in your Screen Scraper Wizard - e.g. when I need to OCR scrape a more complex text (that includes also capital letters and dots in dates), I increase the scale to 5. Does this solution help?


#8

Check this out :slight_smile: