Regex to fetch table data from text file

Hi,

I have the following pdf page extracted and I want certain column from the table extracted from pdf. What Regex works for it? @prasath17

Return to Contents

Streets Improvement Summary

Capital Expenses (1,000s) Total Prior Years FY 2021 FY 2022 FY 2023 FY 2024 FY 2025 Years 6-10 Beyond 10
Yrs

Ocotillo Road - Greenfield to Higley ST0540 66,519 332 12,413 53,774 - - - - -

Baseline Road - Burk to Consolidated Canal ST0710 1,340 87 1,253 - - - - - -

Baseline Road - Greenfield to Power ST0780 8,801 112 - - - - 8,689 - -

Recker Road - Ocotillo to Chandler Heights ST0800 11,611 - - - - - - 11,611 -

Hunt Highway - Val Vista to 164th ST0840 86,241 40 - - - - - - 86,201

Recker Road - Ray to SR 202 ST0960 3,940 455 3,485 - - - - - -

Higley Road - Riggs to Stacey ST0980 20,403 1,973 - - - - - 18,430 -

Ocotillo Road - 148th to Greenfield ST0990 20,317 - 5,568 14,749 - - - - -

Elliot Road - Neely to Burk ST1050 4,060 - - - - - - - 4,060

Val Vista Drive - Appleby to Riggs ST1120 34,277 7,513 26,764 - - - - - -

Lindsay Road - Queen Creek to Ocotillo ST1140 3,881 171 1,097 2,613 - - - - -

Hunt Highway - Higley to Recker ST1150 9,438 - - - - - - 9,438 -

Recker Road - Riggs to Hunt Highway ST1160 11,395 104 - - - - - 11,291 -

Lindsay Road - Pecos to SR 202 ST1170 12,171 6,609 5,562 - - - - - -

Warner Road - Power to East of Recker ST1180 14,752 - - - 14,752 - - - -

Power Road - Guadalupe to Santan Freeway ST1200 15,140 - - - 15,140 - - - -

Val Vista Drive - Riggs to Hunt Highway ST1270 7,779 - - - - - - 7,779 -

Warner and Greenfield Intersection ST1300 12,367 - - - - - - - 12,367

Ray and Gilbert Intersection ST1310 8,536 - - - - - - 8,536 -

Elliot and Gilbert Intersection ST1320 8,626 66 - - - - 8,560 - -

Guadalupe and Val Vista Intersection ST1330 8,880 - - - - - - 8,880 -

Guadalupe and Power Intersection ST1340 8,790 - - 8,790 - - - - -

Elliot and Cooper Intersection ST1380 10,370 670 9,700 - - - - - -

Elliot and Higley Intersection ST1390 7,552 - - - - - - 7,552 -

Germann Road - Gilbert to Val Vista ST1450 27,429 4,385 23,044 - - - - - -

Turn Lane Safety and Congestion Improvements ST1540 9,329 723 1,170 1,104 1,583 1,583 1,583 1,583 -

Accessibility Upgrades in Public ROW ST1550 2,209 1,278 586 345 - - - - -

Constellation Way - South of Coldwater ST1570 3,356 55 - - - - - 3,301 -

Lindsay Road/SR 202 Traffic Interchange ST1580 18,151 7,658 10,493 - - - - - -

Higley and Pecos Intersection ST1590 928 - - 928 - - - - -

Infrastructure Imp at Railroad Crossings ST1600 2,545 57 2,488 - - - - - -

Mesquite Street - Lindsay to Val Vista ST1620 6,107 - - - 6,107 - - - -

Adora Trails Improvements ST1660 874 830 44 - - - - - -

Val Vista Drive Reconstruction ST1690 7,405 491 6,914 - - - - - -

Val Vista Drive - Eastern Canal Bridge Repair ST1710 1,055 150 905 - - - - - -

Val Vista Lakes Improvements ST1720 48,195 480 64 12,147 8,860 10,276 16,368 - -

Higley and Baseline Intersection ST1740 5,028 4,714 314 - - - - - -

Guadalupe Road Improvements ST1770 9,920 - 1,750 8,170 - - - - -

PM10 Paving ST1790 3,283 131 52 1,512 - - - 1,588 -

Cooper Road - Encinas to Baseline ST1800 7,743 - - 7,743 - - - - -

Pecos Road - Allen to Lindsay ST1820 2,962 - 342 2,620 - - - - -

Higley Road - North of Elliot ST1830 1,523 - - 1,523 - - - - -

Cooley Station Transit Center ST1840 10,306 - - - - 10,306 - - -

Greenfield Road Right Turn Lane at Knox ST1850 613 - - 613 - - - - -

Gilbert and Warner Intersection ST1860 11,493 178 53 11,262 - - - - -

McQueen and Elliot Intersection ST1870 11,434 - - 11,434 - - - - -

Lindsay and Guadalupe Intersection ST1880 12,608 - - - - 12,608 - - -

Chandler Heights Improvements ST1890 3,000 - 3,000 - - - - - -

McQueen and Guadalupe Intersection ST1910 12,369 - 2,987 9,382 - - - - -

Val Vista and Ray Intersection ST1920 9,707 - - - - - - 9,707 -

Power and Queen Creek Intersection ST1940 8,537 - - - - - - 8,537 -

Market and Williams Field Interesction ST1960 813 48 39 - 726 - - - -

Val Vista and Mercy Intersection ST1970 1,172 48 39 1,085 - - - - -

Market and San Tan Village Intersection ST1980 1,342 55 35 - 1,252 - - - -

47

I want Column values which contain ST codes. But it will not always start from ST. So need a generic regex @prasath17

Hi @rameezimtiaz
You can use this assign activity to get column values having ST value

suppose u store the string data in a variable let’ say input

now use the below assign activity
outputs = System.Text.RegularExpressions.Regex.Matches(input,“ST\S+”)

@rameezimtiaz - Could you please check this? I have made it generic like instead of ST<4digits> , I am looking 2 alphabets followed by 4 digits as anchor…

https://regex101.com/r/2e37k2/2/

If its good, we can use LINQ query to write it to datatable(finally to excel)

@prasath17 this looks fine to me

@rameezimtiaz - Here is the sample workflow…

image

Build Datatable

Output of this activity , I am calling as Datatable dt.

Matches
image

Assign

(From m In IEnRegex.Cast(Of Match)
Select Dt.Rows.Add(m.Groups(1).toString,m.Groups(2).toString,m.Groups(3).toString,m.Groups(4).toString,m.Groups(6).toString,m.Groups(7).toString,m.Groups(8).toString,m.Groups(9).toString,m.Groups(10).toString)).CopyToDataTable

Output(Showing just few rows for sample here)

You can write the dt results to excel as per you need.

Hope this helps…

@prasath17 Its giving me the ST value concatinated with the name. I only want ST code

@rameezimtiaz - You mean, you don’t want any Amts and Category name…and just want the STCodes??

@prasath17 yes just ST codes

@rameezimtiaz - Hmm…I misread your requirement…you too didn’t see it when I asked to verify :wink:…That’s ok…

Here is the new Regex…

https://regex101.com/r/w3tQBp/1/

(From m In IEnRegex.Cast(Of Match)
 Select Dt.Rows.Add(m.ToString)).CopyToDataTable

Build Datatable - Just create with only Field Called “STCodes”

Output

image

1 Like

@prasath17 getting the error below on assign activity

data table has just one column now

@rameezimtiaz - Did you updated the pattern and the LINQ code?? I didn’t see you clicked the link

 [A-Z]{2}\d{4}(?=\s\d+)

yes i did

@rameezimtiaz - Not sure, where it went wrong…
Regex_Multiple Groups.xaml (22.5 KB)

Please take a look at the xaml and compare with yours.

@prasath17 everything seems fine. Can you post the previous regex as well just for me to double check. I think there could be some problem in matches activity.

@rameezimtiaz - its already there in the given workflow. I have just commented it out.

You got the groups output last time with the same text so you should get this one too. Double check the build datatable activity once.

@rameezimtiaz - is the issue resolved?

Yes I had to change the regex a little. Matches was giving no results