Read specific pdf text using regular expressions

Try this:

Assign (String)
pattern = "Purchase\sDate.+?:\s+(?<purchase>[\d\/]+?)\s+In-service\sDate.+?:\s+(?<service>[\d\/]+?)"

Assign (Match)
match = System.Text.RegularExpressions.Regex.Match(MyVar, pattern)

Assign (String)
purchase = match.Groups("purchase").ToString

Assign (String)
service = match.Groups("service").ToString

2 Likes

This worked!! Thank you so much.
You have been so helpful! Do you have any tips on how to automate drop-down menus? I need the database to choose the correct option in the drop down menu that matches the extracted text.

2 Likes

SelectItem Activity?

2 Likes

I have tried this activity, I am just unsure if I can use it without quoting the exact text? - Each pdf document will have a different extracted value, so how do I indicate it to choose whatever the extracted value is/value stored in the variable?

1 Like

Well, you have to find a way to map the value to the drowpdown option. Depends of both the values and the options.

1 Like

I have figured out all String drop downs using the type into activity, but I am having trouble with the numeric values, and am unsure how to select these in the drop down using my stored variable, any ideas?

1 Like

sorry, I can’t figure out what kind of data you have to map so I can’t help you.

1 Like

Hi there,
Going back to extracting these dates, how would I extract each individual part of the date? (For example: extracting just the “03” from the in service date: “03/10/2020”
And then same for the day and year,
Anything helps!! Thank you!!

Hello,

You can get a DateTime from the string or you can continue process your strings with regex like below.

Assign (String)
pattern = "\b(?<month>\d{2})\/(?<day>\d{2})\/(?<year>\d{4})\b"

Assign (Match)
match = Regex.Match(text, pattern)

Assign (String)
day = match.Groups("day").ToString

Assign (String)
month = match.Groups("month").ToString

Assign (String)
year = match.Groups("year").ToString

That worked! Thanks so much! So how would the code change for extracting the individual parts from the Purchase Date?

It might be a good practice for you.

Some hints:

  • you know how to extract date’s details
  • you know how to extract this date
  • you can make the substitution

Good Afternoon

Sorry to Interrupt, Even i have some question on regex
“05/02/2019 91640.00 52974.00 38666.00 2342299.00”
This is my pattern and have multiple value of same pattern, i want to scrape this specific value “38666.0”.
Even i applied regex (?<=\d{2}/\d{2}/\d{4}\s\d*.\d*\s\d*.\d*\s)(.)(?=\s\d.\d*) and it got worked also.
can i get any alternative simple regex pattern

Thanks in advance

As i am new to regex. I want to learn so just asked can u suggest some sites so i can learn to write pattern

import System.Text.RegularExpressions

Assign (String)
pattern = "(?<date>\d{2}\/\d{2}\/\d{4})\s+(?<first>[\d\.]+)\s+(?<second>[\d\.]+)\s+(?<third>[\d\.]+)\s+(?<fourth>[\d\.]+)\s+"

Assign (Decimal)
result = CDec(Regex.Match(myText, pattern).Groups("third").Value)

I saved it on regex101 for you to examine. The right pane will help you on your journey. I mostly learn throught Perl and Python documentation and by practice (try parsing urls, its quite easy). Here is a link but I didn’t use it (Regular Expression Tutorial - Learn How to Use Regular Expressions)

Thank you so much :slight_smile: