Data extraction from PDF

pdf
activities

#1

Hi,

I am trying to extract data from a PDF. I used read PDF activity to get the entire PDF data into a string. Now the extracted data is of format :
01166355A
CP98133
KALILA ABDURRAHMANN/
013004744524
013004744524 16D293577700
05/14/2016-
05/14/2016
HC:99283 $347.00 – -$294.62 $52.38 – $52.38 45 –
Subtotal $347.00 $0.00 -$294.62 $52.38 $0.00 $52.38 $0.00
01166419A
CP98081
YASMINE ABDUSSAMAD/
233029641221
233029641221 16D287324000
05/14/2016-
05/14/2016
HC:99283 $347.00 – -$294.62 $52.38 – $52.38 45 –
Subtotal $347.00 $0.00 -$294.62 $52.38 $0.00 $52.38 $0.00
Here from the above mentioned text I want KALILA ABDURRAHMANN and YASMINE ABDUSSAMAD data to be extracted. Please help!


How to extract text from pdf files placed in a folder
How to OCR specified field in PDF files while looping trough
How to extract table data pdf with warped tables in pdf?
How to parse PDF text read from "Read PDF Activity"
Trying to extract date from a webpage or pdf
How to get cliping region dynamically from a scanned pdf
Data scraping of a table having single row
Cann't identify elements from my application
How to get the values in a table of a word document in a datatable (As we do in regular excel)
#2

Hi,
To extract the specific value you need to find the start index and end index of the value and pass these index and get the specific value by using substring


#3

Hi,

Thank you for your suggestion.
I tried this way.
stringToExtract.Substring(stringToExtract.IndexOf(“0”),stringToExtract.IndexOf("/")-stringToExtract.IndexOf("/")+19)
But the output am getting is 01166355A CP98133 K


#5

Thanks for your quick help! Will try this!
However your readPDF file has no work flow! Guess u attached a wrong one!

Regards,
Pooja P


#6

oops sorry for that. Anyway u can just follow the above step.
It works. :slight_smile:

here it is…
readPDF.zip (23.5 KB)


Remove data from a string
#7

how will i extract a dynamic variable from pdf? i need to extract “name” value from all the available pdfs and the length/ start or end index will not be known. pls provide a solution for it.


#8

Yeah, @Sravenco I was facing the same issue ,people kindly help regarding above issues,
is it possible to Extract dynamic content from pdf.
@badita @ddpadil need some help asap.


#9

You can use OCR to read complete file, put the output in a text file and use “find” option to get the values you want.


#10

@ddpadil I have extracted the pdf data and now i want to put the extracted data under different columns.How could i do that? Please help.


#11

No solution on this one? I have the same problem.


#12

Does this works for Scanned Pdf and if yes how can i capture the data from pdf by their margin or after the fields such as after Name : xxxxx please suggest thanks in advance


#13