Data extraction from PDF

Hi,

I am trying to extract data from a PDF. I used read PDF activity to get the entire PDF data into a string. Now the extracted data is of format :
01166355A
CP98133
KALILA ABDURRAHMANN/
013004744524
013004744524 16D293577700
05/14/2016-
05/14/2016
HC:99283 $347.00 – -$294.62 $52.38 – $52.38 45 –
Subtotal $347.00 $0.00 -$294.62 $52.38 $0.00 $52.38 $0.00
01166419A
CP98081
YASMINE ABDUSSAMAD/
233029641221
233029641221 16D287324000
05/14/2016-
05/14/2016
HC:99283 $347.00 – -$294.62 $52.38 – $52.38 45 –
Subtotal $347.00 $0.00 -$294.62 $52.38 $0.00 $52.38 $0.00
Here from the above mentioned text I want KALILA ABDURRAHMANN and YASMINE ABDUSSAMAD data to be extracted. Please help!

Hi,
To extract the specific value you need to find the start index and end index of the value and pass these index and get the specific value by using substring

1 Like

Hi,

Thank you for your suggestion.
I tried this way.
stringToExtract.Substring(stringToExtract.IndexOf(“0”),stringToExtract.IndexOf(“/”)-stringToExtract.IndexOf(“/”)+19)
But the output am getting is 01166355A CP98133 K

Thanks for your quick help! Will try this!
However your readPDF file has no work flow! Guess u attached a wrong one!

Regards,
Pooja P

oops sorry for that. Anyway u can just follow the above step.
It works. :slight_smile:

here it is…
readPDF.zip (23.5 KB)

1 Like

how will i extract a dynamic variable from pdf? i need to extract “name” value from all the available pdfs and the length/ start or end index will not be known. pls provide a solution for it.

1 Like

Yeah, @Sravenco I was facing the same issue ,people kindly help regarding above issues,
is it possible to Extract dynamic content from pdf.
@badita @ddpadil need some help asap.

You can use OCR to read complete file, put the output in a text file and use “find” option to get the values you want.

1 Like

@ddpadil I have extracted the pdf data and now i want to put the extracted data under different columns.How could i do that? Please help.

No solution on this one? I have the same problem.

Does this works for Scanned Pdf and if yes how can i capture the data from pdf by their margin or after the fields such as after Name : xxxxx please suggest thanks in advance

1 Like