For Each Loop, Web Scraping per link in a list shows same data each loop

Hi UIPath Forum,

I am having an issue where I perform a “Get full text” on separate web pages and save a individual file after each iteration of the loop, but the content in the “Get full text” is the same each time. I can’t seem to figure it out.

Workflow structure:

  1. Go to defined web page
  2. Scrape list of links (links change daily) into a DataTable
  3. Enter For Each loop by row
    A. Set all variables to null (I have tried this both ways and neither works)
    B. Assign link to variable “URL”
    C. Open browser with variable “URL” (I confirmed via debug it changes in each iteration and screen that displays)
    D. Scrape from browser a set of text (this is the area where I am having trouble, this is the same text no matter the screen or link)
    E. Create .txt file and add text from scrape
    Loop back

End

Any help would be fantastic, I was really confident with what has been happening so far but I seem to have run into a brick wall.

Thanks!

Sean

Edit:
Below is the file that I am working with, I hope it’s helpful!

Main.xaml (24.1 KB)

Second Edit:
Below are files that I would like to scrape from
SampleHTMLFiles.zip (5.1 KB)

Can you send the example of the html code that you want to scrape? to be more easiest try some solution.

1 Like

Will do - adding now, thanks for asking!

1 Like

BrowserTest.xaml (13.8 KB)

Hi Smith, I created a JS to get the text of the body, please take look if was that Did you want.

3 Likes

Hello @smithseanp16 !

I noticed that the Selector for the target text change in both html Samples.

Sample 1:

<webctrl css-selector='body&gt;form&gt;div&gt;div&gt;table' tag='TABLE' />
<webctrl tableRow='2' tag='TD' />

Sample 2:

<webctrl css-selector='body&gt;form&gt;div&gt;div&gt;table' tag='TABLE' />
<webctrl idx='1' tableRow='2' tag='TD' />

Maybe this is what is getting in your way?

2 Likes

Thanks Thiago.
The code below is the selector that I am using, wouldn’t this apply to both files?

<webctrl css-selector='body&gt;form&gt;div&gt;div&gt;table' tag='TABLE' />
<webctrl tableRow='2' tag='TD' />

Also, could a selector cause repeating of the same content on each iteration of the loop? I forgot to add above that I close each tab after the end of each loop, which makes it a little more confusing.

Hi Smith, how are you?

I don’t know if I understood your problem, but it seems that you are trying to get all the text after the “Keywords:” part, right? You are looping over your URLs and you always get the same text, maybe the first one you scrapped, isn’t it? As if the first tab wasn’t yet closed…?

If you still don’t have the solution, maybe you could add some delay after the “Close tab” activity, maybe this can be a try to solve the problem. Other thing I’d like you to try is to get the “Write text file” and “Close tab” activity inside the “Attach Browser ‘Strategic Page’” container, actually try this before anything else.

Best regards,

Bruno Costa.

4 Likes

Thanks Bruno I will have that a shot now - great suggestion on the delay, I do have the close tab included within the same “Do.”

Thanks Roboson - I will give that a try now!

Bruno - The delay and inclusion into the contain is exactly what I needed, thank you so much!!!

You guys are awesome, I am so appreciative.

3 Likes

Dear all,

I have the same problem as Smith. I loop over an URL and I always get the same text the first one scraped.
I insert the “Close application” to close the web page and also a delay after and before opening a new page, but it still not working.
Some suggestions? I cannot attache the sample because I’m a new users

thanks,
Vale