Extract table data from text got from pdf - please help

text_01.txt (166.4 KB)
Hi team , I have attached text , I want to extract the data with headers : DCRD DCRT ITEM NO CO ID SP CODE AMOUNT TRANS CODE OFFICE POLICY NO SURNAME CLEAR UNIT DESCRIPTION FEED CODE with their appropriate data under them in table format (datatable).I tried various ways but could not find out a solution.Can you please help here.

Code I tried -
’ Read the text file into a string variable
Dim filePath As String = “C:\path\to\file.txt”
Dim fileContent As String = File.ReadAllText(filePath)

’ Split the file content by line
Dim lines() As String = fileContent.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries)

’ Create the datatable with headers
Dim dt As New DataTable()
dt.Columns.Add(“DCRD”)
dt.Columns.Add(“DCRT”)
dt.Columns.Add(“ITEM NO”)
dt.Columns.Add(“CO ID”)
dt.Columns.Add(“SP CODE”)
dt.Columns.Add(“AMOUNT”)
dt.Columns.Add(“TRANS CODE”)
dt.Columns.Add(“OFFICE POLICY NO”)
dt.Columns.Add(“SURNAME”)
dt.Columns.Add(“CLEAR UNIT”)
dt.Columns.Add(“DESCRIPTION”)
dt.Columns.Add(“FEED CODE”)

’ Loop through the lines and add each row to the datatable
For Each line As String In lines
Dim values() As String = line.Split(vbTab)
If values.Length = 12 Then
Dim row As DataRow = dt.NewRow()
row(“DCRD”) = values(0)
row(“DCRT”) = values(1)
row(“ITEM NO”) = values(2)
row(“CO ID”) = values(3)
row(“SP CODE”) = values(4)
row(“AMOUNT”) = values(5)
row(“TRANS CODE”) = values(6)
row(“OFFICE POLICY NO”) = values(7)
row(“SURNAME”) = values(8)
row(“CLEAR UNIT”) = values(9)
row(“DESCRIPTION”) = values(10)
row(“FEED CODE”) = values(11)
dt.Rows.Add(row)
End If
Next

Regards,
Gokul

Hi @gokul1904

Your code looks good, but you have spaces in the column names, which can cause issues. Instead, you can replace the spaces with underscores.

Please try the following code & let us know if it helps:

Dim filePath As String = "C:\path\to\file.txt"
Dim fileContent As String = File.ReadAllText(filePath)

Dim lines() As String = fileContent.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries)

Dim dt As New DataTable()
dt.Columns.Add("DCRD")
dt.Columns.Add("DCRT")
dt.Columns.Add("ITEM_NO")
dt.Columns.Add("CO_ID")
dt.Columns.Add("SP_CODE")
dt.Columns.Add("AMOUNT")
dt.Columns.Add("TRANS_CODE")
dt.Columns.Add("OFFICE_POLICY_NO")
dt.Columns.Add("SURNAME")
dt.Columns.Add("CLEAR_UNIT")
dt.Columns.Add("DESCRIPTION")
dt.Columns.Add("FEED_CODE")

For Each line As String In lines
    Dim values() As String = line.Split(vbTab)
    If values.Length = 12 Then
        Dim row As DataRow = dt.NewRow()
        row("DCRD") = values(0)
        row("DCRT") = values(1)
        row("ITEM_NO") = values(2)
        row("CO_ID") = values(3)
        row("SP_CODE") = values(4)
        row("AMOUNT") = values(5)
        row("TRANS_CODE") = values(6)
        row("OFFICE_POLICY_NO") = values(7)
        row("SURNAME") = values(8)
        row("CLEAR_UNIT") = values(9)
        row("DESCRIPTION") = values(10)
        row("FEED_CODE") = values(11)
        dt.Rows.Add(row)
    End If
Next

Hope this helps,
Best Regards.

I am getting empty values -

i guess it is not splitting based on Tab.

instead you can use regex “system.text.RegularExpressions.Regex.Split(line,”\s+“)”
but here the issue is where the field is empty it excludes so you get only values present

1 Like

how should i modify the code if i only have to extract the ITEM_NO and CLEAR_UNIT values from the text file?
Regards,
Gokul

can you confirm that the fileds ItemNo and ClearUnit you are looking only for this data
DCRD DCRT ITEM NO CO ID SP CODE AMOUNT TRANS CODE OFFICE POLICY NO SURNAME CLEAR UNIT DESCRIPTION FEED CODE


03/31/2023 01 26 8,200.00 418 0231 COMB PRO OFF XR01 F
03/31/2023 01 26 72,436.67 418 0231 COMB PRO OFF XR01 F
03/31/2023 01 26 O 8,304.23 418 0231 POL VAN PRCK 2 XR01 R
03/31/2023 01 26 O 13,716.19 418 0231 POL VAN PRCK 2 XR01 R
03/31/2023 01 26 O 35,233.95 418 0231 POL VAN PRCK 2 XR01 R
03/31/2023 02 70 V 1,287,503.72 174 -I/O 0219 VAN PLUS NO15 R
03/31/2023 03 70 V 89,083.02 174 -I/O 0219 VAN PLUS NO15 R

or from entire file as there are many places these fields are present

can you also confirm are you getting this data through text file and that too in single file

Hi.
Can you attach the sample pdf file ? Because you have an option that to extract the pdf table data to excel.

Regards
Balamurugan.S

assuming the data required as above created file

Strings.xaml (42.1 KB)


hope you are expecting the same data
there are several disabled activities please ignore them

Regards

yes I am expecting the same data but the xaml file shared by you is giving error as - Document is Invalid.Was not able to see the code

are you not able to see anything?
it was for version 2023. so may be the issue

will share later today at evening

regards

Hi @gokul1904 ,

This error happens most probably if you are trying to open a workflow created using Windows Project within a Windows-Legacy Project.

You can try opening the file individually/separately, not opening it within a Windows-Legacy project. It should open, or the project.json file would need to be sent as well in order to open it as a Windows Project.

Hi

Please try this and let me know
TablefromTextFile.zip (27.8 KB)

(?<DCRD>\d{2}\/\d{2}\/\d{4})\s+(?<DCRT>\d{2})\s+(?<ITEM_NO>\d{2})\s+(?<CO_ID>\d{2}|)\s+(?<SP_CODE>[A-Za-z]|)\s+(?<AMOUNT>[\d\.\,]+)\s+(?<TRAN_CODE>\d+)\s+(?<OFFICE>\w+|)\s+(?<POLICY_NO>\w+|-I\/O|)\s+(?<SURNAME>[A-Za-z]+|)\s+(?<CLEAR_UNIT>\d{4})\s+(?<DES>[\w\s]+)\s+(?<FEED>\w{4})\s+(?<CODE>[a-zA-Z])(\r?\n|$)

if still face issue then use above regex in matchs activity

new Object() {currentItem.Groups("DCRD").value,currentItem.Groups("DCRT").value,currentItem.Groups("ITEM_NO").value,currentItem.Groups("CO_ID").value,currentItem.Groups("SP_CODE").value,currentItem.Groups("AMOUNT").value,currentItem.Groups("TRAN_CODE").value,currentItem.Groups("OFFICE").value,currentItem.Groups("POLICY_NO").value,currentItem.Groups("SURNAME").value,currentItem.Groups("CLEAR_UNIT").value,currentItem.Groups("DES").value,currentItem.Groups("FEED").value,currentItem.Groups("CODE").value}

in add datarow activity

finally the flow should like this



finally write to excel
if you only require specific fields you can amend the regex , build data table and add datarow

Imp Note : Test as much as possible
Regards