Regex to fetch Project information

Hi @prasath17 Can u help is getting required data from the text below? I have highlighted the required data. Multiple regex works for me

Facilities Management
Project #: 0450-CARP
Replace Carpet - City Hall

Total Cost: $243,203
City Obligation: $200,000
Delivery Method: TBD
Cooperating Agencies: None

Special Circumstances

None

Project Description Location: One City Plaza - City Hall
Remove and Replace the carpet at City Hall.

Project Justification
The carpet at City Hall is 18 years old, with some areas failing and causing a trip hazard.
The carpet needs replacement in phases over the next 4-5 years, beginning in Fiscal
Year 20 and ending in Fiscal Year 24.

Budget Impact/Other
Statement of Impact: There is no impact to the City’s operating budget.

Expenditures Previous 2021 2022 2023 2024 2025 Out Total
Years Years
Other $43,203 $50,000 $50,000 $50,000 $50,000 $243,203
Total $43,203 $50,000 $50,000 $50,000 $50,000 $243,203

Funding Previous 2021 2022 2023 2024 2025 Out Total
Sources Years Years
General Fund $43,203 $50,000 $50,000 $50,000 $50,000 $243,203

Total $43,203 $50,000 $50,000 $50,000 $50,000 $243,203

FY 2021 - FY 2025 Capital Improvement Program Page 18 Effective July 1, 2020

Total has total 8 columns. these could be empty or not empty. Only the field names would remain same rest can change. In the current example 6 fields are filled and only 2 empty.

@rameezimtiaz - Sure…I will take a look…

Quick question - you wanted to extract all the Bolded texts right??

For Sample below one Regex captures 6 fields…Since it was built using groups…to extract the values you have to use the individual group #s…item==> group1 total cost==> group2…etc.etc…

https://regex101.com/r/3Uij28/1/

Yes all bold text

Try this article for regex :slight_smile:

Hope this helps :smiley:

I will try this regex as well. I will probably need 10 separate regex for this imo @prasath17

@rameezimtiaz - you may not need 10 since you have 1 regex doing the job for you. This would avoid 10 separate matches or assign activity in your workflow.

Currently one Regex is fetching 8/10 values. So I will give you only 3 regexes.

@rameezimtiaz -

https://regex101.com/r/As0y9G/1

For Total = https://regex101.com/r/WNiMbo/1

How to fetch groups…

Note: I have used .trim on some of the groups because there are leading and trailing spaces associated with them. we can use it for all also, it wont impact.

@prasath17 Thanks for this. I will try and let u know how it performs

@rameezimtiaz - I just realized that, since we have only set of match we dont have for each loop…

Instead of Item just replace it with IEnRegex(0) . Ex: IEnRegex(0).groups(1).tostring like this…

I can try this. will share the results of all the regex used for this entire pdf. I have tweaked some regex a little though

@prasath17 regex for total is not working for the following text

Parks & Recreation
Project #: 0138-PARKS1
Water Supply - East Wetlands, PAAC

Total Cost: $837,000
City Obligation: $354,000
Delivery Method: TBD
Cooperating Agencies: None

Special Circumstances

Grant / Federal Funding

Project Description Location: PAAC, East Wetland
Construct a pumped water supply to route water to the East Wetland. The source of water is the re-
turn flow from the Yuma Irrigation District. This flow will be collected and routed through a new buried
forcemain located adjacent to the Colorado River Levee. The route of the forcemain will also provide
non-potable water to the Pacific Avenue Athletic Complex for irrigation of turf grass and replenish-
ment of the urban lake.
Project Justification
In 2000, the City of Yuma and the Quechan Tribe partnered to restore 380 acres of wetlands within the
main stem of the Colorado River east of downtown Yuma. The East Wetlands is now part of the Lower
Colorado River Multi-Species Conservation Plan. Water is currently delivered to the East Wetlands by
pumping out of the Colorado River main channel. The pumps are in an area without electric service
and require diesel fuel to be operated. Improving water supply to the East Wetlands will support the
long term vision of restoring habitat along the Lower Colorado River for targeted riparian species, and
eliminating the diesel operated pumping system.
Budget Impact/Other
The bond proceeds denoted are the remaining funds after the construction of the PAAC. The City
of Yuma is working on the federal level to obtain earmark funding for the balance of the antici-
pated costs to construct. It is anticipated that the execution of this project will result in a savings
to the City’s operational budget.

Expenditures Previous 2021 2022 2023 2024 2025 Out Total
Years Years
Design $100,000 $100,000
Construction $737,000 $737,000
Total $837,000 $837,000

Funding Previous 2021 2022 2023 2024 2025 Out Total
Sources Years Years
Bond - 2015 $354,000 $354,000
Issue
Other - $483,000 $483,000
Proposed
Total $837,000 $837,000
FY 2021 - FY 2025 Capital Improvement Program Page 46 Effective July 1, 2020

@prasath17 never mind. This is fixed by adding .+\r?\n? to the regex.

@rameezimtiaz - Design and Construction rows added after Expenditure…So i had to find another anchor…check this…

https://regex101.com/r/8kFrq3/1/

@prasath17 this one is even better. Thanks.
But there is one problem with all these texts that i shared. The problem is that we dont know which value in total row belongs to which of the 8 columns. Can u think of any solution?

@rameezimtiaz - Yes then we need to get 8 different amounts from the total rows…Something like this…

https://regex101.com/r/49wpI8/1/

This is one of those situations where I’d ask the question “can we get this data in a better format?”

Always think of improving processes/data rather than automating bad processes/data.

@postwick I agree. But how can we make it better? This pdf needs better format

Go to the source. Find out how the file is being generated and see if they can generate a better format.

@prasath17 regex would not work because of the text i am getting before applying regex.

There are 2 type of texts i have shared. For the 1st one the value is against columns 2023 and total. For the 2nd one the value is against columns 2021 and total. And i my opinion there is no way to judge this from the current text file format.