Regex to fetch Project information

I didn’t get you…

@prasath17 following is the table in the pdf

and below is the text that is being extracted

Expenditures Previous 2021 2022 2023 2024 2025 Out Total
Years Years
Equipment $30,000 $30,000
Total $30,000 $30,000

As per my understanding we need to make the format more friendly

@prasath17 if u see there is difference in image and text. Just by looking at text how is Bot gonna know that first value of $30,000 should be under 2023 column and the second value of $30,000 is under total column.
I can fix the last value under total column but for the other values i cannot think of any way to complete my requirement

@rameezimtiaz - did you used Preserve format to true in the read to pdf activity?

No. by preserving text both the regexes fail due to very different format shared below

Facilities Management
Project #: 0450-BOIL
Replace Boiler - Yuma Art Center

Total Cost: $30,000
City Obligation: $30,000
Delivery Method: TBD
Cooperating Agencies: None

Special Circumstances

None

Project Description Location: Yuma Art Center
Remove and replace a boiler at the Yuma Art Center.

Project Justification
The current boiler is aging and will be at the end of life by the year 2023.

Budget Impact/Other
Statement of Impact: It is anticipated that this project will create a savings to the City’s
operational budget.

Expenditures Previous 2021 2022 2023 2024 2025 Out Total
Years Years
Equipment $30,000 $30,000
Total $30,000 $30,000

Funding Previous 2021 2022 2023 2024 2025 Out Total
Sources Years Years
Two Percent $30,000 $30,000
Tax
Total $30,000 $30,000

FY 2021 - FY 2025 Capital Improvement Program Page 17 Effective July 1, 2020

I can share text file if u want. this is how it looks if preserved format is true @prasath17

@rameezimtiaz - yes text would be great…and I would like to know what is the max amount in each column …like 8,000,000 in Millions or Billions ?

Note : I like the preserve format personally much better for regex especially table formats…

@prasath17 amount could be any, there is no limit to it. get attached txt file below

0450-BOIL.txt (2.1 KB)

@prasath17 i dont think the amount will be in billions. i didn’t see it in any table.

@prasath17 any luck with the regex?

@rameezimtiaz - Nope…i am not positive that this can be cracked.

@prasath17 what is the problem we r getting?

@rameezimtiaz = Empty Column spaces…

@prasath17 can we fetch table data column wise?

@rameezimtiaz - can we fetch table data column wise??

Again, I don’t think this can be done…and I don’t have time …may you can post this separately on the forum so that someone will assist…

@prasath17 Thanks for helping me so much.
cheers mate

1 Like

@prasath17 I was able to get the text in a format that I can use. Is it possible to get a regex that fetches ‘total’ row which is bold in the text?

NJg-a4 W Facilities Management
0 Project #: O450-BOIL
Replace Boiler - Yuma Art Center
“Til-Tilt , ’|‘“‘—“-“§,2__ 7”” ‘1-
I ~1‘7‘-3’¢::-.t*.;-J.ittau-i" WI ‘
Total Cost: $30,000 I I ‘ ’ 1 _ ; i ~~ " " J; 5;.
City Obligation: $30,000 :21, ”I #15,.18t-RLLT—‘—‘r-- . a;
Delivery Method: TBD g. S g<I . I E I. - saw?"
Cooperating Agencies: None ‘ g E.‘ I-—§ ‘. ’ «fa—"Ii {fif-
EE—T—i‘3—‘i‘d‘f‘4m;: L. I "5’: I ’ $5.
a t; a a 2 l‘ L LEI -. rs
Special Circumstances ET TE FH— jf: :3: l IGI‘S’S’II
277* 7 7,72 7"? fit—.4 i: 1-: 1Praliect_E;,w. a 1;? ‘ 3?.
E- ” ,__ flLmtiOn ‘J‘fififi »
None |_ 5:" IiI-E-it’ %’ .L I
“—92 l T ‘ "EiI % I it I I

  • E I:I . '3 1‘… E I " ‘
    Lrojectillescriptiion Location: Yuma Art Center
    Remove and replace a boiler at the Yuma Art Center.
    Lerestinstiificgfion
    The current boiler is aging and will be at the end of life by the year 2023.
    EuggzetlmppgiLOher
    Statement of Impact: It is anticipated that this project will create a savings to the City’s
    operational budget.
    Expenditures Previous 2021 2022 2023 2024 2025 Out Total
    Years Years
    I Equipment I I I I $30,000 I I I I $30,000 I
    I Total I I I I $30,000 I I I I $30,000 I
    I ’ I
    Funding Previous 2021 2022 2023 2024 2025 Out Total
    Sources Years Years
    Two Percent $30,000 $30,000
    Tax
    I Total | | | | $30,000 | | | | $30,000 I
    FY 202i - FY2025 Capital Improvement Program Page l7 Effective July l, 2020

I can use split function afterwards if i can get this value @prasath17

@rameezimtiaz - are you looking something like this?