Data extraction from PDF

Pooja1 · June 1, 2017, 9:49am

Hi,

I am trying to extract data from a PDF. I used read PDF activity to get the entire PDF data into a string. Now the extracted data is of format :
01166355A
CP98133
KALILA ABDURRAHMANN/
013004744524
013004744524 16D293577700
05/14/2016-
05/14/2016
HC:99283 $347.00 – -$294.62 $52.38 – $52.38 45 –
Subtotal $347.00 $0.00 -$294.62 $52.38 $0.00 $52.38 $0.00
01166419A
CP98081
YASMINE ABDUSSAMAD/
233029641221
233029641221 16D287324000
05/14/2016-
05/14/2016
HC:99283 $347.00 – -$294.62 $52.38 – $52.38 45 –
Subtotal $347.00 $0.00 -$294.62 $52.38 $0.00 $52.38 $0.00
Here from the above mentioned text I want KALILA ABDURRAHMANN and YASMINE ABDUSSAMAD data to be extracted. Please help!

ddpadil · June 1, 2017, 10:19am

Hi,
To extract the specific value you need to find the start index and end index of the value and pass these index and get the specific value by using substring

Pooja1 · June 1, 2017, 12:21pm

Hi,

Thank you for your suggestion.
I tried this way.
stringToExtract.Substring(stringToExtract.IndexOf(“0”),stringToExtract.IndexOf(“/”)-stringToExtract.IndexOf(“/”)+19)
But the output am getting is 01166355A CP98133 K

Pooja1 · June 2, 2017, 6:54am

Thanks for your quick help! Will try this!
However your readPDF file has no work flow! Guess u attached a wrong one!

Regards,
Pooja P

ddpadil · June 2, 2017, 9:04am

oops sorry for that. Anyway u can just follow the above step.
It works.

here it is…
readPDF.zip (23.5 KB)

Sravenco · December 28, 2017, 6:25am

how will i extract a dynamic variable from pdf? i need to extract “name” value from all the available pdfs and the length/ start or end index will not be known. pls provide a solution for it.

prathamesh.c · January 8, 2018, 7:42am

Yeah, @Sravenco I was facing the same issue ,people kindly help regarding above issues,
is it possible to Extract dynamic content from pdf.
@badita @ddpadil need some help asap.

Sravenco · January 8, 2018, 10:34am

You can use OCR to read complete file, put the output in a text file and use “find” option to get the values you want.

SHAISTA · February 3, 2018, 7:16am

@ddpadil I have extracted the pdf data and now i want to put the extracted data under different columns.How could i do that? Please help.

sannasmajl · April 5, 2018, 6:00am

No solution on this one? I have the same problem.

abhiseky93 · September 27, 2018, 12:21pm

Does this works for Scanned Pdf and if yes how can i capture the data from pdf by their margin or after the fields such as after Name : xxxxx please suggest thanks in advance

Topic		Replies	Views
PDF Extraction_ Help pdf , activities , data_scraping , question	3	829	February 16, 2020
Extract dynamic data's from PDF Activities excel , pdf , activities , question , pdf-extraction	4	417	October 14, 2023
Get the data of a particular column of a pdf file Help	0	786	September 8, 2019
Extract data from pdf document Help pdf , activities , question	18	2136	February 3, 2020
PDF data reading and extraction Help	10	1233	October 29, 2018

Data extraction from PDF

Related topics