Extracting PDF data into Excel

hi ,
in my case having a pdf file consists data in the form of table. so I want to extract that data and should be saved in excel. can you please help me.

1 Like

Use a regex expressions to convert the extracted text into a CSV delimited format, save it as file and use ReadCSV activity to get a datatable. Do not try to use ‘Generate Data Table’ since it is buggy, in 2017.1 at least.

can u explain briefly explain.

Check this example I set up for a POC, it extracts a PDF table into a DataTable. There is a ‘tableExtractRegex’ regex which should extracts the table part and a ‘rowSplitRegex’ to extract the rows.

ReadConfirmationPDF.xaml (7.1 KB)
TableExtractor.xaml (14.6 KB)
CSVToDataTable.xaml (5.2 KB)
Auftragsbestaetigung-Kramer-3pos.pdf (195.2 KB)

6 Likes

Hey @Marcel

I have extracted the pdf data and now i want to put the extracted data under different columns.How could i do that? Please help.

1 Like

@SHAISTA Build a datatable with the column names and create a output datatable.
After each extraction from pdf, use add data row option and Pass your variables in array row in this format:{strOrderNumber,strOrder} ,then you’ll get datatable as output.Use Append Range and Write that datatable to excel file.

Thanks,
Sreekanth.K

1 Like

I am using screen scraping so whole data gets extracted at once, how can i extract each word one by one. What should be the loop i should use?

@SHAISTA In Screen scraping,Already the output is datatable.Just write the datatable using append range to excel file.
Thanks,
Sreekanth.k

Hi shaista,

you can also split the o./p using space delimeter and store all the values inside an array and then you can loop one by one.

Hi @aamir @sreekanth

My extracted datat comes in this format in the excel. Now how should i loop through it?
Please check it once.

Thanks in advance.

Hi shaista,

First calculate the no of rows from the excel using datatablevariable.rows.count.tostring and then start the loop starting from A5 as in your excel sheet and then iterate till the end of the row.

Hi @aamir ,

Yes i calculated it already. It came to be 31.
I want to know that now i will use select range activity na and then for each loop. Would that work?

You have all the data in one column right?

yes thats the issue. I dont know why it is coming in one column again and again. :roll_eyes:

Hi @SHAISTA,

@aamir,

Split the string using
stroutput.Split(Environment.NewLine.ToArray, StringSplitOptions.RemoveEmptyEntries)

you will get the array then append the value to datatable and push it into excel

Regards,
Arivu

no issues all you want is to do is to read the values from one column one by one.
int x=5
use while x <= datavariable.rows.count
{
excel application Scope
{
use read cell activity to read value under input of read cell use “A”+x.tostring and store in some variable

}
then x=x+1
}

@arivu96 @aamir Thanx for the quick responses. I will check and get back to you if needed.

please help me,
I want to pass the data from the PDF file through Excel but the data split is not split.Tôi đã làm theo cách các bạn chỉ nhưng không được??

Same question no answer from last 1 year.

1 Like

Hi @Marcel ,

This might be a silly question. But in the TableExtractor.xaml file, you have used invoke method activity to split text read using Read PDF text activity. In this activity i dont see any input given string.so below are my doubts:-

  1. What is the input to the invoke method activity
  2. I am getting error while trying the same activity and same fields.

Please advise if i am doing something wrong. Attached screenshot for your reference.

Regards,
Arun