String manipulation issues to get table data

hi,
how to export this data to excel using string manipulation or regex

can any one helppdf.txt (1.1 KB)

Ref. # Date Transaction Details Transaction Amount
111 12/08/2017 AutoDeposit $1,000.40
112 12/10/2017 Cash Deposit $900.50
1136 12/15/2017 Check-489 $1,080.00
3455 01/20/2018 TyhWithdrawal $120.99
0000 01/24/2018 Ending Balance $3,700.41

Hi @Anand_Designer

I think the function Split is what you are looking for

@Anand_Designer
in general it can be done by:

  • cleanse / mark the data: after digits, dates; before Currency we can insert column seperator e.g with the help of Regex
  • the cleansed data can be feed to to a generate datatable activity and a datatable will be returned

I copy the txt file to XLS
open XLS just to get the sheetname(check the property section, workboot sheet name assigned as wb)
then Read XLS in a datatable using readrange, specify the sheetname
copy the Datatable to XLSX using write range
Later close the XLS workbook
delete the XLS

Date=System.Text.RegularExpressions.Regex.Match(pdftext, β€œ(?<=Date\s+)[^\n\r]+”).Value

Transaction Details=System.Text.RegularExpressions.Regex.Match(pdftext, β€œ(?<=Transaction Details\s+)[^\n\r]+”).Value

i tried above that but not getting Data…

@Anand_Designer
Have a look here:
grafik

so it is looking for seperator and we should can use it for a replace with e.g. ;
grafik

maybe it is better to run the regex pattern seperated

the result we do afterwards feed into the generate datatable activity

Hello Anand,
In this video, I have 17 use-cases for extracting tables from PDF and write data in Excel:

2:00 GitHub free code for all the files
2:20 Logic of general workflow
4:40 File 1 simple PDF
9:50 File 2 PDF with a column with multiple lines
20:10 File 3 PDF with a column with multiple words ON the LAST column
27:00 File 5 PDF with a column with multiple words ON inside column (2 columns)
31:40 File 6 PDF with a column with multiple lines
39:10 File 8 simple PDF
42:15 File 9 PDF with multiple spaces on that need to be correct
45:50 File 10 PDF with multiple columns that have multiple lines + multiple pages
55:50 File 11 simple PDF with protection empty Cells
58:35 File 12 Big PDF with an empty line and Empty columns and partial total
1:02:25 File 13 PDF with multiple columns that have multiple words and hard to define a rule
1:10:15 File 15 PDF with multiple columns that have multiple lines
1:12:50 File 17 simple PDF remove spaces from headers also remove space from Data
1:16:05 File 18 simple PDF
1:17:10 File 19 PDF with multiple pages and columns with multiple lines
1:22:10 File 20 PDF with multiple columns that have multiple lines
1:25:00 File 21 PDF with empty columns and subtotal

Code:

Thanks,
Cristian Negulescu

1 Like