Extract data from pdf and save to CSV

Hello i’m working on an automation.

I have a PDF and i found the regex (2 regex) to extract from PDF the data I need.

at the end i’ll have 2 columns where in each column i have a specific data (2 columns (2 regex) and about 200 records).

could you help me to understand how can I save these data that i retrieve from PDF with regex and save all the data in a CSV file?

maybe I need a Data Table? but i don’t know how proceed, which activity I need, which instructions I need…

Thanks!

1 Like

in fact it seems that I already have all the data (2 columns with all recrods) in a variable (System collection generic ienumerable ).

How can I put it in a CSV file?

Hi @Luca09 ,

Although we might not have the full details of your implementation or what is your end output required but there are already many posts in here that would help you in a General way to figuring out what needs to be done when extracting data from PDF and get it to a Structured format.

One of the posts is below :

It might not be the exact way that is required for your needs but there is a General method introduced that could also be used for your case

Do check it and let us know if it was helpful. But to help you more specific to your case, we might need to know more of your data format, the regex that you are already using and what has been implemented till now.

@Luca09 ,

For this part you would have to use Add Data Row Activity and add the data captured from Regex to the Datatable.

This Datatable you might have to create it using the Build Datatable activity and add the required columns to it.

Then to write this data you can use Write to CSV activity or Write Range activity to write it to a CSV file or an Excel sheet.

yes, you are right… but i think i close to the solution but i don’t find the last step.

my flow is this:
image


image

here what i set inside activities:

first data table:
image

second data table:
image

pdf to text saved to a variable called strPDF

Result of Matches:

image

at the end in the CSV I have only the first record (first column and second column (vehicle + total)

and if I change this

image

to ienMatch(3).Value i see the forth element in the CSV file… but only the forth and not the other…

So… i have all the elements in the ienMatch but i don’t know how to save all records to CSV…

I hope that now my flow is more clear than before…

Thanks

@Luca09

Basically if your data is there in ienuMatch

Use a for loop on ienumMatch and change type argument to Match

Now inside loop currentitem.Value will give you each value for each iteration…where you can use a add data row activity with a new datatable with one column of type string(use build datatable to have the structure) in add datarow activity give the datatable and then use {currentitem.Value}

Now after loop use write csv with the datatable then the data would be written to csv

Hope this helps

cheers

Thanks @Anil_G ,

i tried that but after populated the ienMatch variable, if I try to do a loop and WriteLine currentItem.Value i see only all the values of the second column…

image

but in the next activity where I have “Add to Data Row”:

{dtData.rows(0).item(“Value”).ToString,dtData.rows(1).item(“Value”).ToString}

to

dtOutput

image

it writes in CSV first and second column of the first object (ienMatch(0) saved at the beginning)…

I cannot understand what i have to do…

Thanks!

@Luca09

Can you show what you want and what youa re gettings.?

Are there two ienum variables ?

If yes use index of for loop as well and then

{currentitem.value,ienummatch2(index).value}

Will add both one after other in two columns

Cheers

Hello,

sorry but i think there is a basic error on my side.

In ienMatch I have only an array of value… not both columns…

so my question should be: How can I save IENumerable variable (array of string) in a column of a DataTable?

because if I can fix that i can fix my task… i should create 2 loop…

the first one search the first regex in the PDF file and save it to ienMatch and then save it in the first column of a datatable… then another loop with the second regex… search for the second regex, save to ienMatch and then save ienMatch to the second colum.n…

is it possibile?

thanks again for the support!

@Luca09 , The expected way the Regex Expression should have been might be different. You are having two expressions separately and are trying to get each match and co-relate the match to each other (Vehicle and Total ).

But Maybe we could modify the expression to a Single Regex, where we will be able to capture different values using groups.

But we would require to know details of your Input data (Sample data) and How you would want the output to be.

no, only 1 ienum variable…

and when i said that in the CSV i see only the first element… it’s because i save here only the first element:

image

because I loop the two column of dtData that contain 2 record , one for each regex… so i save the value of the first colum in the first round of the loop and first value of secodn column in the second and last round of the loop.

so ienMatch after loop contain only all the result of second regex because first regex is gone and overwrited…

Thanks @supermanPunch for support!

yes, we can modify dtData with only one regex and then match strPDF (whole PDF) with one regex and save the result to ienMatch… now in ienMatch i’ll have all the result for the first regex… the same for second regex and save it to ienMatch2…

now i have ienMatch and ienMatch2… the first one with result of first regex and second one for second regex…

how can I save all these data (2 different ienum vairbale) in the same Data Table (dtOutput) to save it in a csv?

@Luca09

Then you have to save to different columns

For each are you getting one match only or multipe matches?

If only one match then what youa re doing will save the first value in first row and second value in second row…yes ienum will not contain the first…

If youw ant both …then do two separate regex matches…then you get teo variables…

Now use a nee datatable …build one with teo columns …first column for first regex values and second column for second regex values

Now use loop on one ienum and then assign a variable to index and then use

Add data row with {currentitem.value,ienum2(index).value}

Which will save all values matches in both enum in two different columns of a data table

Cheers

2 Likes

Thanks @Anil_G

now I have ienMatch (with all result for first regex) and ienMatch2 (with all result second regex).

Now i have to put them in a single data table (dtOutput) , ienMatch in the first column and ienmatch2 in second column.

i tried to do like you said but maybe i did some error:

ok fixed! now i have all elements in the CSV!

i created an index and with for each i insert both elements in the data row and then data row to CSV

thanks a lot for the support!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.