Data extraction from PDF

pdf
activities

#1

Hi,

I am trying to extract data from a PDF. I used read PDF activity to get the entire PDF data into a string. Now the extracted data is of format :
01166355A
CP98133
KALILA ABDURRAHMANN/
013004744524
013004744524 16D293577700
05/14/2016-
05/14/2016
HC:99283 $347.00 – -$294.62 $52.38 – $52.38 45 –
Subtotal $347.00 $0.00 -$294.62 $52.38 $0.00 $52.38 $0.00
01166419A
CP98081
YASMINE ABDUSSAMAD/
233029641221
233029641221 16D287324000
05/14/2016-
05/14/2016
HC:99283 $347.00 – -$294.62 $52.38 – $52.38 45 –
Subtotal $347.00 $0.00 -$294.62 $52.38 $0.00 $52.38 $0.00
Here from the above mentioned text I want KALILA ABDURRAHMANN and YASMINE ABDUSSAMAD data to be extracted. Please help!


How to extract text from pdf files placed in a folder
How to OCR specified field in PDF files while looping trough
How to extract table data pdf with warped tables in pdf?
Trying to extract date from a webpage or pdf
How to parse PDF text read from "Read PDF Activity"
How to get cliping region dynamically from a scanned pdf
Cann't identify elements from my application
How to get the values in a table of a word document in a datatable (As we do in regular excel)
Data scraping of a table having single row
#2

Hi,
To extract the specific value you need to find the start index and end index of the value and pass these index and get the specific value by using substring


#3

Hi,

Thank you for your suggestion.
I tried this way.
stringToExtract.Substring(stringToExtract.IndexOf(“0”),stringToExtract.IndexOf("/")-stringToExtract.IndexOf("/")+19)
But the output am getting is 01166355A CP98133 K


#5

Thanks for your quick help! Will try this!
However your readPDF file has no work flow! Guess u attached a wrong one!

Regards,
Pooja P


#6

oops sorry for that. Anyway u can just follow the above step.
It works. :slight_smile:

here it is…
readPDF.zip (23.5 KB)


Remove data from a string
#7

how will i extract a dynamic variable from pdf? i need to extract “name” value from all the available pdfs and the length/ start or end index will not be known. pls provide a solution for it.


#8

Yeah, @Sravenco I was facing the same issue ,people kindly help regarding above issues,
is it possible to Extract dynamic content from pdf.
@badita @ddpadil need some help asap.


#9

You can use OCR to read complete file, put the output in a text file and use “find” option to get the values you want.


#10

@ddpadil I have extracted the pdf data and now i want to put the extracted data under different columns.How could i do that? Please help.


#11

No solution on this one? I have the same problem.


#12

Does this works for Scanned Pdf and if yes how can i capture the data from pdf by their margin or after the fields such as after Name : xxxxx please suggest thanks in advance