Regex expression to find value in block of text

Hello! I am reading in PDF data and need to pick out a specific value from a block of text. The value is static in it’s position within the block no matter what page I read in, however, the amount of characters is dynamic. The block of text I am working with is as follows:

Writing to a text file:

Residue mmbtu 3,787.11 100.00% 3,787.11 $1.623163 $6,147.10 
Total 3,787.11 3,787.11 $6,147.10

From UiPath message:

Residue mmbtu 3,787.11 100.00% 3,787.11 $1.623163 $6,147.10 \r\nTotal 3,787.11 3,787.11 $6,147.10 \r\n

I am trying to pull out the second dollar amount so in this example I should get 6,147.10 as an answer.

Using a regex expression generator (https://regexr.com/) I am able to pull the value out successfully using the expression: (?<=$)\S+|(?<=\$)\S+

However, when I try it in UiPath by assigning a variable to the code: system.Text.RegularExpressions.Regex.Match(str_ResVolAmt,"(?<=$)\S+|(?<=\$)\S+ ").ToString, no value is pulled out.

@ Robert_Schauer
Welcome to the community.

Hope this helps.
RegEx.xaml (6.0 KB)

1 Like

Thank you for jumping on this :slight_smile: A couple of quesitons for follow up:

  1. I need to build the processing using the enterprise version of UiPath, so is the UiPath.Core.Activities.Matches activity part of that edition?

  2. The expression successfully parsed out the values after every $ and assided them to each index. I have a more advanced problem where the position or index of the value might change. The example I have is as follows:

Fees 
Description Fee Unit Fee Quantity Fee Rate Fee Value 
Electric gal 15,524.01 0.011391 $176.83 
Low Volume USD 0.00 400.000000 $0.00 
Marketing Fee gal 15,524.01 0.057579 $893.85 
Marketing Fee mmbtu 3,787.11 0.046063 $174.45 
Processing mmbtu 5,690.48 1.957675 $11,140.11 
Transportation gal 15,524.01 0.000000 $0.00 
Total $12,385.24 

I need to pull 174.45 from the line that starts with “Marketing Fee mmbtu”, but the position of that line will change occasionlly. Any thoughts?

Hi,

Answer to Question1: Matches is availble in the Enterprise Edition.
Question 2: Is it data from the Fee value column that you need from all rows?

@SowmyaLeo,

Yes, techincally I do need to pull values from the “Fee Value” column. I want to be careful saying that however, becasue I need value from that column as well as a select amount of lines such as “Marketing Fee mmbtu”.

Is it possible to share a sample pdf please.

Apparently new users don’t have the ability to post files, so hopefully the code below works!

<object data=”/pdf/Targa Badlands Gas Stmts Final 1 0819 9-26-19 (002).pdf" type=”application/pdf” width=”100%” height=”100%”>

@SowmyaLeo,
The code did not work, so here is an image for now at least:

@Robert_Schauer The following expression would pull out all non-white space characters from the line containing “Marketing Fee mmbtu”: (?<=Marketing Fee mmbtu.*\$)\S+

Assign YourNumber = Regex.Match(YourInputString,"(?<=Marketing Fee mmbtu.*\$)\S+",RegexOptions.IgnoreCase)

Add system.text.regularexpressions to your imports tab so you don’t have to type it in each time

2 Likes

@Dave,

Brilliant! That worked, and additially, I should just be able to change the string that is being used for the positive lookbehind function to find my other values. @SowmyaLeo thank you for your help as well!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.