Extracting text from webpage via regex, but the text is dynamic

Hello guys. So i’ve ran into a problem. I need to get some data from a website, but dont know how to solve one problem. In the websites page where my code goes, there is a description section. In the huge text, there is a line - sometimes its "EUROKOD:", sometimes its "eurokod:". So my problem is how to write a regex code, that would always search for word “Eurokod” in all caps, not caps, first upper case letter and so on and extract the numbers that are next to them. For example:
1.


2.

**NOTE - Sometimes, the text that i need to extract (Eurokod:***) is not present**

@Povilas_Jonikas

Input=System.Text.RegularExpressions.Regex.Match(Input,"(EUROKOD|Eurokod)\:\d+.*").Value

if Eurokod is not present

image

(EUROKOD|Eurokod)\:\d+.*|\d+[A-Z]{3,}\d+[A-Z]+

Then this will match all conditions

1 Like

Hi @Povilas_Jonikas

Check the below regular expression it will extract the data behind the Eurokod if it is in capital, small or no probelm,

(?<=(eurokod|EUROKOD|Eurokod).*)[A-Z\d]+

image

Hope it helps!!

1 Like

Thanks guys, will check in a bit and let you know :slight_smile:

Hey @Povilas_Jonikas
try use this:
(?i)eurokod:\s*(\d+)

System.Text.RegularExpressions.Regex.Match(yourInputStringVariable, "(?i)eurokod:\s*(\d+)").Groups(1).Value

1 Like

You can do it in Other ways @Povilas_Jonikas by using Find Matching Patterns activity.

→ Drag and drop the Find Matching Patterns activity.
image

→ Open the Properties go to Pattern options field, Check the Ignore Case option as below image,
image

→ use the below regular expression in the Pattern field,

(?<=(eurokod).*)[A-Z\d]+

→ Create a Variable in the Result field and you can use that variable to get the output of it. The output variable is in Collection of Matches datatype. The output will be stored in that variable.

It will give the output as you required when the Casing of Input changed.

Hope you understand!!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.