Convert pfd to excel (use regex)

I have file pdf and now I use activity read pdf text and set preserveFormatting = true as below.
data.txt (1.5 KB)

I want output file excel as below.

output.xlsx (8.7 KB)

Please guide me about it.

Read the text file using “Read Text File” activity
Create a datatable using Build datatable, have all the column you want (from your final excel)

Split your data first by Newline character, you will get the array with ‘one Line’s element’ in one array element
strTextFileContent.Split(Environment.Newline)

Then use for each and loop through that array.
Split using a space character, you will get an array which will have one whole row data
System.Text.RegularExpressions.Regex.Split(row, “\s{2,}”))

Simply add that in the datatable created, using Add Data Row

At the end of this loop you will have a datatable with your desired output, you can use Write Range to paste it in an excel file.

Hope this helps!

1 Like

@rahulsharma I use activity follow you suggest.
But error as below

Please guide me for solve it.

split operation is for string variable

Your text variable is not a string

You can change the datatype to string from Variable panel in the bottom of studio

Or alternatively you can just put text.toString.Split

@rahulsharma Now text variable as string.
But same error.

actually this expression will give the output as array, I see in the snap the data variable is of String type, Make it as Array of string and it will work

Appreciate you are trying! :+1:

@rahulsharma I use text.Split(Environment.NewLine.TocharArray) and data variable as string array as below. Not error

And I have question about your suggest.

Split using a space character, you will get an array which will have one whole row data
System.Text.RegularExpressions.Regex.Split(row, “\s{2,}”))

What use activity assign right?

This activity will split your data with 2 or more than 2 space characters.

Use it inside the for each of your previously created array and then for every iteration, this split will give you an array. That array can be used to enter a row in datatable using Add Data Row

@rahulsharma I error in System.Text.RegularExpressions.Regex.Split(row,“\s{2,}”)

How to insert in arrayRow , DataRow (yellow highlight)?

Please guide me for solve it.

row basically is the variable of that for loop

you can check what is the variable on the left side of for loop you used. Replace row with that variable

and Add data row, just mention ArrayRow as the array created and Datatable as DT. No need to mention DataRow

that variable might be item in your case

@rahulsharma Sorry, I don’t understand.
Now my flow as below.

This file would help you

Make sure you have below namespace installed
System.Text.RegularExpresssion
you can install it using the namespace tab from the studio, near variables panel

Output would be the datatable, if you wish you can beautify it a bit.

Hope this helps

TextToDT.zip (9.0 KB)

@rahulsharma Output is not correct.
test.xlsx (10.5 KB)

Output from Bot as below.

But correct data as below.

Please guide me for solve it.

Then in the inner loop, for splitting via spaces, Remove the 2 and make it 1

also before adding to datatable using Add Data Row, have a IF condition
row.tostring.contatains(“/”)
This will add only rows with date as you want

Appreciate you trying, do try above steps. That would help!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.