Extracting PDF data into Excel

excel
pdf
activities

#1

hi ,
in my case having a pdf file consists data in the form of table. so I want to extract that data and should be saved in excel. can you please help me.


#2

Use a regex expressions to convert the extracted text into a CSV delimited format, save it as file and use ReadCSV activity to get a datatable. Do not try to use ‘Generate Data Table’ since it is buggy, in 2017.1 at least.


#3

can u explain briefly explain.


#4

Check this example I set up for a POC, it extracts a PDF table into a DataTable. There is a ‘tableExtractRegex’ regex which should extracts the table part and a ‘rowSplitRegex’ to extract the rows.

ReadConfirmationPDF.xaml (7.1 KB)
TableExtractor.xaml (14.6 KB)
CSVToDataTable.xaml (5.2 KB)
Auftragsbestaetigung-Kramer-3pos.pdf (195.2 KB)


PDF to Excel Book
Converting Multi Page Bank PDF (Bank Statement) into Excel File
Convert table from PDF to Excel
#5

Hey @Marcel

I have extracted the pdf data and now i want to put the extracted data under different columns.How could i do that? Please help.


#6

@SHAISTA Build a datatable with the column names and create a output datatable.
After each extraction from pdf, use add data row option and Pass your variables in array row in this format:{strOrderNumber,strOrder} ,then you’ll get datatable as output.Use Append Range and Write that datatable to excel file.

Thanks,
Sreekanth.K


#7

I am using screen scraping so whole data gets extracted at once, how can i extract each word one by one. What should be the loop i should use?


#8

@SHAISTA In Screen scraping,Already the output is datatable.Just write the datatable using append range to excel file.
Thanks,
Sreekanth.k


#9

Hi shaista,

you can also split the o./p using space delimeter and store all the values inside an array and then you can loop one by one.


#10

Hi @aamir @sreekanth

My extracted datat comes in this format in the excel. Now how should i loop through it?
Please check it once.

Thanks in advance.


#11

Hi shaista,

First calculate the no of rows from the excel using datatablevariable.rows.count.tostring and then start the loop starting from A5 as in your excel sheet and then iterate till the end of the row.


#12

Hi @aamir ,

Yes i calculated it already. It came to be 31.
I want to know that now i will use select range activity na and then for each loop. Would that work?


#13

You have all the data in one column right?


#14

yes thats the issue. I dont know why it is coming in one column again and again. :roll_eyes:


#15

Hi @SHAISTA,

@aamir,

Split the string using
stroutput.Split(Environment.NewLine.ToArray, StringSplitOptions.RemoveEmptyEntries)

you will get the array then append the value to datatable and push it into excel

Regards,
Arivu


How to Split a String
#16

no issues all you want is to do is to read the values from one column one by one.
int x=5
use while x <= datavariable.rows.count
{
excel application Scope
{
use read cell activity to read value under input of read cell use “A”+x.tostring and store in some variable

}
then x=x+1
}


#17

@arivu96 @aamir Thanx for the quick responses. I will check and get back to you if needed.


#18

please help me,
I want to pass the data from the PDF file through Excel but the data split is not split.Tôi đã làm theo cách các bạn chỉ nhưng không được??