Most efficient way to web data extraction?

Hello, I would like to do some data extraction on a website. But I hesitate between two methods. I have to extract several informations in a block, the informations are arranged like this.

info1: Text
info2: Text info3: Text
info4: Text

But the layout of the information in the block can change depending on the web page. So I don’t know if it’s better to do a Get Full Text activity for each information, or to do one for the whole block, and then extract the information in the String.

Hi @Rems ,

Better to extract whole and do string manipulation or use regex to get the value.

I see, thank you. I guess that limits the number of errors. And how would you extract this data? What String functions would you use?

Use the below regex -

System.Text.RegularExpressions.Regex.Match(strTest,“(?<=info1: )\S+”).value

Thanks, it looks pretty good, how you handle the spaces. This expression stops at the first space encountered

You mean only space come then how to handle the error or the string is having space with the value. Can you be more specific?

I mean the string has a space in its value, for example an adress.

info1: 5 street of…

Is there a way to extract this ?

Use this
System.Text.RegularExpressions.Regex.Match(strTest,“(?<=info1: )\s*([\S\s]+)”).value

Thank you, however this expression refers all the rest of the text from the expression matched. It does not stop at the line break.
So I used this one, which seems to work well

Regex.Match(strTest,“(?<=info1: )[^\n]+”).value

Try this - System.Text.RegularExpressions.Regex.Match(strTest,“(?<=info1: )\S+.*”).value

Hope this will work.

Yes, this one works too, thank you

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.