Extract table from raw text

Hi All,

I want to extract table after a keyword from a text file. Is there any way I can do that. I am attaching a sample text. I want to search keyword “By Project Category” and extract table after it which is marked in bold.

20Return to Contents

Future Years Operating Cost Summary

By Project Category

Operation and Maintenance Impact (1,000s): FY 2021 FY 2022 FY 2023 FY 2024 FY 2025 Total

Streets Improvements - 185 229 289 289 992
Traffic Control Improvements - 11 31 32 37 111
Municipal Facilities Improvements - 442 2,088 2,204 2,204 6,938
Redevelopment Improvements - - 38 161 173 372
Storm Water Improvements - - - - - -
Water Improvements - 242 297 297 322 1,158
Wastew ater Improvements - 691 691 691 691 2,764
Parks and Recreation Improvements - 3 17 17 66 103

Net Additional Operating Cost $ - $ 1,574 $ 3,391 $ 3,691 $ 3,782 $ 12,438

The operating impacts for all project types, shown by expense type is shown below. Anticipated
revenues are also shown below. Detail by project type is shown on the following pages.

By Expense Category

Operation and Maintenance Impact (1,000s): FY 2021 FY 2022 FY 2023 FY 2024 FY 2025 Total

Personnel - 178 1,584 1,614 1,614 4,991
Contractual Services - 458 554 645 726 2,383
Supplies - 579 715 788 796 2,877
Utilities - 359 504 609 611 2,083
Insurance - - 34 35 35 104

Total O&M Impact $ - $ 1,574 $ 3,391 $ 3,691 $ 3,782 $ 12,438

Total Revenue $ - $ - $ - $ - $ - $ -

21Return to Contents

Future Years Operating Cost Summary

Streets Project Summary

@rameezimtiaz - Please check the values from column FY1 to FY5…is this the output you are looking for??

Regex Pattern Link

I want this entire table

Streets Improvements - 185 229 289 289 992
Traffic Control Improvements - 11 31 32 37 111
Municipal Facilities Improvements - 442 2,088 2,204 2,204 6,938
Redevelopment Improvements - - 38 161 173 372
Storm Water Improvements - - - - - -
Water Improvements - 242 297 297 322 1,158
Wastew ater Improvements - 691 691 691 691 2,764
Parks and Recreation Improvements - 3 17 17 66 103

you mean you wanted to capture the categories also ??? Like below…

Yes I want it like this

The problem with the regex you shared is that it will only get me data if there is word improvements in it. The thing is that i want a table after “by project category” and that table can have any data which might not contain improvements in it.

@rameezimtiaz - if you don’t put any anchor then it will become generalized…i have made it generic below…you see it started capturing the table underneath the first one also…

@prasath17 can u post the regex here and tell which activity u used for it?

@rameezimtiaz - Here is the regex link

Haven’t used anything yet(waiting for your confirmation) in Uipath, I can build datatable and assign the group matches to it and finally would write the reults to excel…

@prasath17 when i use current regex it fatches all tables in the text. there are around 200 tables in my text file and i want only that table that comes after string “by project category”

check this

I am not positive that , I can get the tables based on the criteria…you can try the below approach…

Just a thought: If you extracted the this text file from pdf, we can loop thru pdf pages and extract the page which has 'by project category ’ and if that page consist of only one page then we can use regex shared above to extract the details…

Yes this is from pdf. I have wrote the code that given me by project category text and line number.

@prasath17 how do i loop thru pages and get that specific page with “by project category” string?

@rameezimtiaz

  1. Get pdf page count
  2. Loop thru pages either using do while or using For each
  3. Inside the loop , using read pdf activity and get page range from the index
  4. Take the strinput from previous step and use contains or regex match to check if that page has that string then go ahead and apply the logic to extract the table

Update :

@prasath17 Thanks for the update. Can you share this code file as well if possible?

@rameezimtiaz -
Delete_PDFPage.zip (5.3 MB)

This is the sample workflow which I have developed to delete the page contains the keyword and merge rest of the pages.

In this workflow, you can remove the pdf splitter(and Join PDF activity) in the If loop and add the Regex Matches to extract the table.

Hope this helps.

@prasath17 I have created a Workflow that does what i want. Thanks for your help.
I will need some more help from you in getting some other data from some other table.

@rameezimtiaz - Glad to know…if this resolved your query please mark my post as solution, that will close this thread.