I am developing an RPA tool that scrapes data from a site, manipulates the scraped text using regex and then writes back to an excel worksheet.
The above text is an example of the text I will be scraping. I have used a Get Full Text activity to select the Div that this text is under. Although the resulting text returns a lot of whitespace and special characters.
The regex I have tried using does not seem to return the desired result I am after.
Using the Replace activity I tried the following:
- I want to remove all text after ‘- Other’
- I want to remove all text up to the first occurring 4 digit number (ie. 2008 in the above example)
I then used a Matches activity which has the following regex:
- I used this to break int lines with a dash character (-) into separate lines.
I then have an Excel Application Scope and I iterate through each item of the previous output to write back to the excel worksheet. Although the resulting text is not what I had expected the regex to return.