Most efficient way to web data extraction?

Rems · July 8, 2021, 8:24am

Hello, I would like to do some data extraction on a website. But I hesitate between two methods. I have to extract several informations in a block, the informations are arranged like this.

info1: Text
info2: Text info3: Text
info4: Text

But the layout of the information in the block can change depending on the web page. So I don’t know if it’s better to do a Get Full Text activity for each information, or to do one for the whole block, and then extract the information in the String.

ermanoj3101 · July 8, 2021, 8:45am

Hi @Rems ,

Better to extract whole and do string manipulation or use regex to get the value.

Rems · July 8, 2021, 8:51am

I see, thank you. I guess that limits the number of errors. And how would you extract this data? What String functions would you use?

ermanoj3101 · July 8, 2021, 8:52am

Use the below regex -

System.Text.RegularExpressions.Regex.Match(strTest,“(?<=info1: )\S+”).value

Rems · July 8, 2021, 9:00am

Thanks, it looks pretty good, how you handle the spaces. This expression stops at the first space encountered

ermanoj3101 · July 8, 2021, 9:38am

You mean only space come then how to handle the error or the string is having space with the value. Can you be more specific?

Rems · July 8, 2021, 9:45am

I mean the string has a space in its value, for example an adress.

info1: 5 street of…

Is there a way to extract this ?

ermanoj3101 · July 8, 2021, 10:08am

Use this
System.Text.RegularExpressions.Regex.Match(strTest,“(?<=info1: )\s*([\S\s]+)”).value

Rems · July 8, 2021, 10:21am

Thank you, however this expression refers all the rest of the text from the expression matched. It does not stop at the line break.
So I used this one, which seems to work well

Regex.Match(strTest,“(?<=info1: )[^\n]+”).value

ermanoj3101 · July 8, 2021, 11:00am

Try this - System.Text.RegularExpressions.Regex.Match(strTest,“(?<=info1: )\S+.*”).value

Hope this will work.

Rems · July 8, 2021, 11:41am

Yes, this one works too, thank you

system · July 11, 2021, 11:44am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to get particular parts from string Studio uiautomation , robot , activities , studio , question , activities_panel	10	1009	April 10, 2023
Studio: RegEx on a Web Application Studio studio , question , activities_panel	10	1983	October 17, 2021
How do I extract all information from point A to point B Studio studio , question , activities_panel	17	533	July 20, 2023
Extract certain text from string Studio uiautomation	7	526	April 21, 2023
Retrieve only certain data Help activities , question	10	830	January 9, 2020

Most efficient way to web data extraction?

Related topics