Data scraping from a single page

How to scrap data from a single page, and its url is changed dynamically, ie the page don’t have another field for select after the next button of Select Element box

Can you provide screenshots?

Capture3
The above data I want to scrap, and my code is in a for each row so every page have different values.

It is possible?

What do you mean each page has different values? can you show a screenshot of another page also?

Are you saying that each page can be scraped, but you want to know how to dynamically move from one page to the next?

Capture4

Each of those pages are the same structure and have the same fields. Screen scraping should therefore act on each of them the same.

Difficult to understand what your actual issue is without further information.

Main.xaml (9.3 KB)

this is my project, it open a web page every time and i want to capture that pages selected data.
it’s an example code only.

itspecialists1.xlsx (50.2 KB)
this is my excel sheet

Step 1:
In your open Browser step, output to a Browser variable for use later.

image

Step 2:
Still within your for each loop, add an Attach Browser activity. The input to this will be your Browser variable created in step 1. Then add an Extract Structured Data activity within this. You can use the Wizard to help you arrange this activity so that it grabs all the data you need.

image

Adjust the data definition settings and view the preview of the data to do this:

Step 3:
Export each grabbed page of data to a datatable. You can then use this as you require. I did a test and you will need to do some manipulation of the output to clean your data and make it useable, but the data is structured and so should be achievable though can sometimes involve some complex use of regex expressions to parse the data.

In the test I did all data is scraped, but its set to a single column and many fields share cells. So you can see this may take a bit of work but the data is there as a first step.

Note…one alternative option is to grab individual fields from each page using the Find Children activity as it makes it easier for you to add each to a structured datatable column by column in your loop. There are many examples of how to do this on the forum.

It show all data in a single column, After that I spent lot of time to reset it. There is no other way to capture data?

You should look at the Find Children activity, for which there are numerous examples on the forum on how to apply it.

It will allow you to grab individual fields from the page and add them to a predefined datatable as you go along. This is how I would be most comfortable doing it, however, I will leave it to you to look over the details of how it works. It takes some up front analysis of the web page and the selectors which align to the data you want to grab. Once you understand this, you can use the filter property on Find Children to focus on grabbing what you need.

I have put together this workflow which works for your needs and produces the following output:

FindChildren_Jobs2.zip (10.8 KB)

Very thanks for your support. This works fine. But I can’t create my own one. It get error always. Can you suggest any video tutorial for the same? If you know any…

Thank you…

I tried a lot of time but i can’t create a new project. Can you please tell how to get this data from that site. I create all variables as like your example. Can you please help me.

I have already provided a workflow which returns the data from the site.

What error message are you getting?


I can’t understand how find children collect those data from web page. and how to save it to our datatable.

  1. It takes the web address you have provided an opens a browser session.
  2. Find Children returns all descendant elements of the webpage which align to the value in the activity’s filter property ("<webctrl idx='1' parentid='overview' tag='P' />"). This is output to a list.
  3. Each item in this list, which is a UiPath.Core.Uielement object, is analysed and the inner text attribute added to a collection (JobList). This is your data. Each inner text attribute of each element which contains "<webctrl idx='1' parentid='overview' tag='P' />" is a valid data point in terms of how that webpage has been built.
  4. To add this collection as a datarow, we need to change the JobList to an array (JobList.ToArray) and add that to your datatable.
  5. The flow then moves onto the next webpage/job as part of the outer ForEach loop.