PDF data reading and extraction

shreyaank · September 10, 2018, 6:52am

Hi all,

Im trying to read pdf using Read Pdf text functionality,

So I need to extract the line just below the word “Employee Verification”. … that is Not applicable or whatever that is present below the word Employee Verification …below is the sample pdf format from which the data has to be extracted. Can anyone help on this?

Level of Check Client Level Colour Code GREEN
FINAL BACKGROUND REPORT
Executive Summary
Education Verification
Bachelor of Science Verified
Employment Verification
Not Applicable
Criminal Verification
Bangalore (Police) No record
Anantapur (Police) No record
Database Checks
India-specific Database Checks Not Applicable
Global Database Checks Not Applicable
Major Discrepancy Clear Report

pathrudu · September 10, 2018, 6:59am

@shreyaank

assign the output from read pdf to a string variable (str_extractedData)
str_extractedData.SubString(str_extractedData.Indexof(“Employment Verification”)+23,str_extractedData.Indexof(“Criminal Verification”)-str_extractedData.Indexof(“Employment Verification”)-23)

Hope this May give you output.

Regards,
Pathrudu

shreyaank · September 10, 2018, 7:11am

Hi Pathrudu,
I tried this:

Text.ToString.SubString(Text.ToString.IndexOf(“Employment Verification”)+23,Text.ToString.IndexOf(“Criminal Verification”)-Text.ToString.IndexOf(“Employment Verification”)-23)

below is the error

pathrudu · September 10, 2018, 7:17am

with the same code i could able to extract the information required for me…

here is the workflow i have

test.xaml (6.0 KB)

regards,
Pathrudu

shreyaank · September 10, 2018, 7:34am

When I tried the same it says Index out of range, May be cos my text extracted from pdf contains more lines than the one which I have supplied above, the above one was just a sample, but the actual pdf is much more longer…but the data I want lies between employment verification and criminal verification

shreyaank · September 10, 2018, 7:34am

Any other way to get the required text??

pathrudu · September 10, 2018, 8:04am

Do you have any more fields with same name? if not it should work though it has longer data.

shreyaank · September 10, 2018, 8:37am

reference.xaml (29.3 KB)

No there is just one field of both employment verification and criminal verification

can you please review my code?

pathrudu · September 10, 2018, 8:48am

Please go through this code, this may give you better understanding. It’s not necessary to generate Data table and you have taken line by line and applying substring where in each row you will be getting

based on condition you are applying sub string function on 6. Employment Verification and Criminal Verification that was the reason you are getting error.

here is the modified code and it should work for you.

reference.xaml (19.3 KB)

shreyaank · September 10, 2018, 8:53am

Oh Yes! Thanks man it works perfectly. I have used a similar logic for other fields so thought would use the same to generate this one as well.

Topic		Replies	Views
Data extraction from PDF Help pdf , activities	11	15167	October 29, 2018
Read range of lines in PDF Help	25	4922	February 25, 2019
How to get the text from pdf and its related particular text Help studio , question	4	853	June 24, 2019
To extract values between two strings of text Help studio	22	18328	December 7, 2018
Extracting Specific data from PDF getting an error Help activities , question	9	1054	August 17, 2020

PDF data reading and extraction

Related topics