How to extract text from pdf files placed in a folder

lissynikkytha · June 15, 2017, 9:17am

Hi,

I extract the files using Directory.GetFiles() and tried to use Read PDF with OCR inside the “For each” item loop. With file name being dynamic since the files are extracted from the folder, how can we specify the filename in “Read PDF”?
Is there an equivalent activity similar to scrape relative to fetch the values of the corresponding field since i couldn’t view scrape relative option for extracting data from pdf?

ddpadil · June 15, 2017, 9:50am

Hi,
1.After extracting files from folder(Directory.GetFiles() ) pass the array inside foreach loop
then pass item inside pdf path. Hence it will iterate through all the dynamic files.
2.either you open pdf and scrape specific field using relative scrape option else after reading pdf follow these
To extract the specific value you need to find the start index and end index of the value and pass these index and get the specific value by using substring

here is the solution file

Topic		Replies	Views
Dynamic text extract using OCR Help ocr , activities , data_scraping , question	1	1202	December 22, 2019
Looping through PDF files to extract specific selected data Academy Feedback	4	1838	June 28, 2019
How to OCR specified field in PDF files while looping trough Help pdf , ocr , data_scraping	1	2957	July 13, 2017
How to extract data from Dynamic PDF Tabular data Help pdf	1	1679	April 8, 2019
Looping pdf files in the folder and extracting particular data from each pdf file Help	9	3943	October 17, 2019

How to extract text from pdf files placed in a folder

Related topics