Need help urgently: Trying to extract text from a webpage in relation to another piece of text

Dear all,

I am trying to extract the date underneath “Period Ended” from a series of similar (but not entirely identically structured) webpages that look like this (please click on link to see full webpage):

I have tried Get Text, Find Text, Find Element and various combinations involving Anchor Base. I have tried to define the correct Selector using UI Explorer - all without success so far. Could somebody please point me in the right direction how I can extract the date (in this case 30/09/2017) by using “Period Ended” as the anchor?

I am quite desperate by now and believe that it cannot possibly this difficult. Thank you very much in advance for your help!

Just at a quick glance, looks like there are 2 approaches that would work. 1 is the “Anchor Base” possibly, but you may need to perform an “End” keystroke to make sure the text is showing in the window. - I don’t have UiPath accessible right now so I can’t test it.

The second option is to use Get Text on the entire Form, then use Regex to extract the value next to “Period Ended”. The Regex might look like this but forgive me if the pattern has a mistake:
Regex.Match( textVariable, "(?<=(Period Ended\n\r))(.*)" ).Value.Trim
I used “\n\r” to represent the newline character, but that may be wrong, but you can look up correct pattern to use.

Regards.

1 Like

Hi Clayton, thanks a lot for this quick reply! Thanks to you, I am now definitely headed in the right direction, but having one last remaining issue assigning the result of the Regex to my string variable.

Not sure if this is the most elegant way of doing it, but this regex identifies the date after “Period Ended” correctly:
(?<=(Period Ended))(.[0-9]/[0-9]/[0-9]*)

In the attached screenshot you can see the correct result - but when I then try to assign that value to my string variable using regMatches.ToString, I get an error message “Object reference not set to an instance of an object.” Can I trouble you one more time to help me with this? Thanks again!

Hi.

Matches should only be used if you are trying to extract multiple strings that match that pattern, and it will also return an enumerable meaning you would need to use an index to identify the item in the enumerable (like an array). But, in your case, you just need the one match really.

Let me just skip all this talking though and teach you how to fish or so to speak.

Using a Write Line or Message Box, somewhere by itself like at the very beginning of the workflow or in a blank workflow, just for testing, place this:
Regex.Match("Period Ended"+System.Environment.Newline+"19/01/2019", "(?<=(Period Ended))(.[0-9]/[0-9]/[0-9]*)").Value

Basically, you want to test out your pattern to make sure it works right. If it extracts the date, then all you need to do use that exact line of code in your Assign activity, but replace the text with the variable name.

I hope this helps, Stu. Unfortunately, I’m too lazy to fire up my work laptop to to confirm your Regex solution is working as intended (it’s the weekend lol). It’s possible that the Matches could not find anything so it came out empty.

Regards.

Hi Clayton, you’re a star! (Is there any way that I can upvote or otherwise recognise your help in this forum?) I realise that I may not have described my remaining issue very clearly: my regex definitely works, but I am having a problem accessing the result. It seems to be sitting in an ‘array’ of only one element, but when I try to assign that value to my designated (and properly declared) string variable, I get the error message “Object reference not set to an instance of an object.” Intriguingly, the phrase “Period Ended” is followed immediately (without line breaks or other non-printable characters) by the date that I am looking for. So that’s why my regex (somewhat counterintuitively) works.

I realize that the expression you sent me (“Regex.Match([…]).Value”) would get me nicely around having to use the UiPath ‘Matches’ activity - but when I try to cut and paste this into a Message Box, I get a validation error saying that “Regex” is not declared.

Is there a particular package that I need to install? So far, I have the following:
image

Thanks again and greetings from Singapore!

Hi Clayton, not to take up any more time of yours unnecessarily, I think I cracked it. Key were the following two forum threads.

This showed me how to solve the ‘regex is not declared’ error:

I had to go for the clunky solution of adding the entire namespace to the expression as I keep getting a package installation failure for System.Texxt.RegularExpressions:
image
No idea why, but as long as I have a functioning workaround, this won’t bother me too much.

And this thread gave me clue to assigning the result of Matches to my string variable:

So at long last, it seems I can finally collect those dates.

Thanks again for your help with this! Would never have found the regex approach on my own. Have a great rest of the weekend!

P.S.: Using ‘Matches’ on the content of the entire table with my regular expression (“(?<=(Period Ended))(.[0-9]/[0-9]/[0-9]*)”) works. Using Regex.Match with the same expression to straight away assign the extracted value to my variable does not and only produces an empty string.

Yeah, that’s weird. Normally, I use the entire namespace anyway though just so when migrating projects to other users and machines, there is no worry that a namespace needs to be imported first.

You might need to adjust the pattern, because usually, the (.*) will only match everything on the same line so it probably couldn’t see the date on the next line down. So to resolve that, you need to look for the newline character in the pattern.

Here is the adjustment I would make or something similar:
"(?<=(Period Ended(.*)\n(.*)))[0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4}"

So I added \n inside the look behind so it looks for the newline and maybe anything around it, then put in a pattern to match with a date, which basically says to take 1-2 digits for day and month and 2-4 digits for year.

I also tested this with a snippet of text but not the entire text. But, like I said the wildcard (.*) will usually stop at a newline.

:grin: No worries, I’m happy to help, and you don’t need to do that, lol.

You can configure your Anchor Base like so to make it work. :slight_smile:

  • Anchor: Find Element, selector: <webctrl aaname='Period Ended' />
  • Activity: Get Text, selector: <webctrl tag='DD' />
  • Optionally, you can also set the AnchorPosition on the Anchor Base to Top.

Tested with IE and Chrome.

1 Like