How to extract complete URL using Data Scrapping wizard

datascraping

#1

I have been struggling to get the complete URL using Data Scrapping wizard. My expected URL should be displayed as “https://www.walgreens.com/article714”. However, the wizard is pulling as “/article714” which doesn’t serve my purpose.

Infact, the demo videos on UI path Data Scrapping show complete URL, but when I tried, it pulls only partial URL like I mentioned above. Any thoughts?

Thanks!


#2

Hi Prashanth,

Would you mind providing the website where the table is located?

Using the Data Scraper, you can either choose the option
“You selected a table cell, would you like to extract the data from the whole table?”
or using the Extract Wizard to define each column at a time. At the Configure Columns option of the Extract Wizard, you can also choose the “Extract URL” checkbox for any columns with URL in it. This should help you extract the entire URL.

Let me know if you need any more help.,
Long


#3

I am using the extract wizard and have choosen “Extract URL”. But still all the URL’s that is extracted is missing this part “https://www.discountcontactlenses.com

I am trying to extract all 21 Product URL’s from the website mentioned below:

https://www.discountcontactlenses.com/Search/GetSearchResults?searchType=All&sortType=Relevance&searchText=Acuvue

Example of my Expected Result should be:
https://www.discountcontactlenses.com/discount-contacts/acuvue-oasys-1-day-for-astigmatism-30-pack-contact-lenses/714
https://www.discountcontactlenses.com/discount-contacts/acuvue-vita-contact-lenses/687

The UI path is pulling the URL as mentioned below which is incorrect
/discount-contacts/acuvue-oasys-1-day-for-astigmatism-30-pack-contact-lenses/714
/discount-contacts/acuvue-vita-contact-lenses/687


#4

Hi Prashanth,

The problem here is that the website (or any related similar website) is using a relative path, here’s an excerpt from the html of the webpage:
<a href="/discount-contacts/acuvue-vita-for-astigmatism-contact-lenses/718"> ACUVUE VITA for Astigmatism contact lenses

A work-around for this is a bit tricky, but definitely doable.
The Extract Table activity should output to an ExtractDT variable. Using a for-each row of this DataTable, you can append the necessary “https://www.discountcontactlenses.com” to the URL col for a comlpete url path you can use.

Hope that helps!
-Long


#5

Thanks Chen… I tried with your suggestion and was able to append URL for a single column. Appreciate if you can help in implementing for each loop to have the URL for all the data in column 2. I have attached the test fileTestURL.xaml (12.5 KB)

Thank you very much!


#6

Something like this should work, although you don’t necessarily need to export to Excel either and just work within the DataTable variable context. TestURL.xaml (13.4 KB)
Thanks, Long


#7

Thanks Chen…Looks like this really helps in understanding the concept. May be I need to do little bit of tweaking.