How to extract complete URL using Data Scrapping wizard

I have been struggling to get the complete URL using Data Scrapping wizard. My expected URL should be displayed as “https://www.walgreens.com/article714”. However, the wizard is pulling as “/article714” which doesn’t serve my purpose.

Infact, the demo videos on UI path Data Scrapping show complete URL, but when I tried, it pulls only partial URL like I mentioned above. Any thoughts?

Thanks!

Hi Prashanth,

Would you mind providing the website where the table is located?

Using the Data Scraper, you can either choose the option
“You selected a table cell, would you like to extract the data from the whole table?”
or using the Extract Wizard to define each column at a time. At the Configure Columns option of the Extract Wizard, you can also choose the “Extract URL” checkbox for any columns with URL in it. This should help you extract the entire URL.

Let me know if you need any more help.,
Long

I am using the extract wizard and have choosen “Extract URL”. But still all the URL’s that is extracted is missing this part “https://www.discountcontactlenses.com

I am trying to extract all 21 Product URL’s from the website mentioned below:

https://www.discountcontactlenses.com/Search/GetSearchResults?searchType=All&sortType=Relevance&searchText=Acuvue

Example of my Expected Result should be:
https://www.discountcontactlenses.com/discount-contacts/acuvue-oasys-1-day-for-astigmatism-30-pack-contact-lenses/714
https://www.discountcontactlenses.com/discount-contacts/acuvue-vita-contact-lenses/687

The UI path is pulling the URL as mentioned below which is incorrect
/discount-contacts/acuvue-oasys-1-day-for-astigmatism-30-pack-contact-lenses/714
/discount-contacts/acuvue-vita-contact-lenses/687

Hi Prashanth,

The problem here is that the website (or any related similar website) is using a relative path, here’s an excerpt from the html of the webpage:
<a href="/discount-contacts/acuvue-vita-for-astigmatism-contact-lenses/718"> ACUVUE VITA for Astigmatism contact lenses

A work-around for this is a bit tricky, but definitely doable.
The Extract Table activity should output to an ExtractDT variable. Using a for-each row of this DataTable, you can append the necessary “https://www.discountcontactlenses.com” to the URL col for a comlpete url path you can use.

Hope that helps!
-Long

1 Like

Thanks Chen… I tried with your suggestion and was able to append URL for a single column. Appreciate if you can help in implementing for each loop to have the URL for all the data in column 2. I have attached the test fileTestURL.xaml (12.5 KB)

Thank you very much!

Something like this should work, although you don’t necessarily need to export to Excel either and just work within the DataTable variable context. TestURL.xaml (13.4 KB)
Thanks, Long

1 Like

Thanks Chen…Looks like this really helps in understanding the concept. May be I need to do little bit of tweaking.

HI chen

I have tried to data scarp from journal website.

My attributes is paper title,paper author,year of publishing,that paper keyword it means what kind of tech used in that paper.

i already done that first 3 columns…now i am try to get that keywords part.i have that every pdf paper url…can you help me in this how i can done?

hello, can you explain with example i can’t understand

Hi @Acash

Like that :slight_smile:
image

This is just appending a string to the beginning of each item in each row :slight_smile:

1 Like

hello, Thanks a lot for replying I am doing to Web Scrapping this Web Site “https://indiankanoon.org/search/?formInput=fromdate%3A%2001-01-2019%20todate%3A%2011-01-2019%20doctypes%3Alaws%2Cjudgments%2Ctribunals%2Cothers%2C” While I am Scrapping the Title and Add the Url in csv file then Base Url not add because in href like this /docfragment/1387015/?formInput=fromdate%3A%201-1-2019%20todate%3A%2011-1-2019 but How to add with base url ? Main.xaml (19.1 KB)

Hi,

I am having some issues with Data Scrapping wizard, wish you could lend me some help.
The issue is while execution, the wizard is not fetching all the result as it need to be.

In my case, I need data from a eCommerce website - Flipkart.com. I need product Name, URL, Price for a product, say refrigerator. The website displays 572 results, while csv file I am writing gives an output of less rows. It was 526 once, then it was 552 another time. It is scrapping till the last page, not like missing a page, instead missing rows in between.

I tried this with multiple products but I am not getting accurate number of results.

Please let me know if anyone has faced same issue and knows the solution or why it is happening.

Thanks
Shubham

it is not working bro

its not working bro

It helped me …:slight_smile: thanks