Duplicates with data scraping wizard

Trying to use the data scraping wizard to get videos from google using the video button and it seems to work fine with titles and URLs. But when I add the date or description fields, I get duplicates where there is no data in the “Date” field.

Hi @tfs

This post might help:

I’ll look at that.

But what I did find was that even though the preview showed duplicates (where the date was blank in every other row), the actual data table had only the rows that actually had dates in them. So the correct rows were being returned.

Not sure why the extra rows were in the preview. It turns out that happened with the date element or the detail element of the google page.

Could you summarize if the issue was fixed now, maybe with the new version of the UIAutomation activity package?

If I understood correctly, the DataTable after extraction is done does not have extra rows and this is just a visual glitch?

I too am seeing duplicates in preview only (and not in DataTable) as on 29 Jan 2022. This is the version I’m running.
image

Just to clarify - the preview version of the UIAutomation activity package?

Also, could you maybe share the xaml file that is causing this behaviour? (either here or via a private message to me at @loginerror)

No Maciej, preview data section of Extract Wizard window (Screen Scraping).

Steps followed:

  1. Search “data scraping” videos on google in Edge browser
  2. Click Screen Scraping
  3. On Extract Wizard - select first element (video URLs on page)
  4. Select second element (video URLs on page)
  5. Select “Extract URL” check box
  6. Extract Wizard data preview shows 2 columns - 1st with video title, 2nd with URLs (as expected)
  7. On Extract Wizard - Click “Extract correlated data” button and repeat steps 4 & 5 for date on the web page
  8. Extract Wizard data preview shows 3 columns with data for column 1 & 2 being duplicated
    Main.xaml (11.1 KB)

we had observed this behaviour also on older versions when a correlated column has an inconsitent structure.

  • configure the columns
  • continue till the first duplicate is observed (we also do see the changed positions)

analyse then more in detail for a row, where the issue causing column differs from others and try to find a more general selector for this column