How to extract complete URL using Data Scrapping wizard

Prashanth · April 19, 2018, 2:46pm

I have been struggling to get the complete URL using Data Scrapping wizard. My expected URL should be displayed as “https://www.walgreens.com/article714”. However, the wizard is pulling as “/article714” which doesn’t serve my purpose.

Infact, the demo videos on UI path Data Scrapping show complete URL, but when I tried, it pulls only partial URL like I mentioned above. Any thoughts?

Thanks!

Long_Chen · April 19, 2018, 4:44pm

Hi Prashanth,

Would you mind providing the website where the table is located?

Using the Data Scraper, you can either choose the option
“You selected a table cell, would you like to extract the data from the whole table?”
or using the Extract Wizard to define each column at a time. At the Configure Columns option of the Extract Wizard, you can also choose the “Extract URL” checkbox for any columns with URL in it. This should help you extract the entire URL.

Let me know if you need any more help.,
Long

Prashanth · April 20, 2018, 10:15am

I am using the extract wizard and have choosen “Extract URL”. But still all the URL’s that is extracted is missing this part “https://www.discountcontactlenses.com”

I am trying to extract all 21 Product URL’s from the website mentioned below:

Discount Acuvue Contacts | DiscountContactLenses.com

Example of my Expected Result should be:
“Discount ACUVUE OASYS 1-Day for Astigmatism Contacts | DiscountContactLenses.com”
“Discount Acuvue Vita Contacts | DiscountContactLenses.com”

The UI path is pulling the URL as mentioned below which is incorrect
/discount-contacts/acuvue-oasys-1-day-for-astigmatism-30-pack-contact-lenses/714
/discount-contacts/acuvue-vita-contact-lenses/687

Long_Chen · April 20, 2018, 8:15pm

Hi Prashanth,

The problem here is that the website (or any related similar website) is using a relative path, here’s an excerpt from the html of the webpage:
<a href=“/discount-contacts/acuvue-vita-for-astigmatism-contact-lenses/718”> ACUVUE VITA for Astigmatism contact lenses …

A work-around for this is a bit tricky, but definitely doable.
The Extract Table activity should output to an ExtractDT variable. Using a for-each row of this DataTable, you can append the necessary “https://www.discountcontactlenses.com” to the URL col for a comlpete url path you can use.

Hope that helps!
-Long

Prashanth · April 23, 2018, 7:59am

Thanks Chen… I tried with your suggestion and was able to append URL for a single column. Appreciate if you can help in implementing for each loop to have the URL for all the data in column 2. I have attached the test fileTestURL.xaml (12.5 KB)

Thank you very much!

Long_Chen · April 23, 2018, 4:15pm

Something like this should work, although you don’t necessarily need to export to Excel either and just work within the DataTable variable context. TestURL.xaml (13.4 KB)
Thanks, Long

Prashanth · April 26, 2018, 7:09am

Thanks Chen…Looks like this really helps in understanding the concept. May be I need to do little bit of tweaking.

ganesh_rajan · October 12, 2018, 6:55am

HI chen

I have tried to data scarp from journal website.

My attributes is paper title,paper author,year of publishing,that paper keyword it means what kind of tech used in that paper.

i already done that first 3 columns…now i am try to get that keywords part.i have that every pdf paper url…can you help me in this how i can done?

Acash · January 10, 2019, 9:56am

hello, can you explain with example i can’t understand

loginerror · January 11, 2019, 9:08am

Hi @Acash

Like that

This is just appending a string to the beginning of each item in each row

Acash · January 11, 2019, 10:01am

hello, Thanks a lot for replying I am doing to Web Scrapping this Web Site “https://indiankanoon.org/search/?formInput=fromdate%3A%2001-01-2019%20todate%3A%2011-01-2019%20doctypes%3Alaws%2Cjudgments%2Ctribunals%2Cothers%2C” While I am Scrapping the Title and Add the Url in csv file then Base Url not add because in href like this /docfragment/1387015/?formInput=fromdate%3A%201-1-2019%20todate%3A%2011-1-2019 but How to add with base url ? Main.xaml (19.1 KB)

shubhmjain2112 · January 26, 2019, 3:10am

Hi,

I am having some issues with Data Scrapping wizard, wish you could lend me some help.
The issue is while execution, the wizard is not fetching all the result as it need to be.

In my case, I need data from a eCommerce website - Flipkart.com. I need product Name, URL, Price for a product, say refrigerator. The website displays 572 results, while csv file I am writing gives an output of less rows. It was 526 once, then it was 552 another time. It is scrapping till the last page, not like missing a page, instead missing rows in between.

I tried this with multiple products but I am not getting accurate number of results.

Please let me know if anyone has faced same issue and knows the solution or why it is happening.

Thanks
Shubham

nageshkumar · September 5, 2019, 11:45am

it is not working bro

nageshkumar · September 5, 2019, 12:16pm

its not working bro

Devarajan_Sundaresan · January 21, 2020, 5:25pm

It helped me … thanks

Topic		Replies	Views
Data Scraping Wizard, No Option For Extract URL Help	7	7972	May 19, 2020
Data Scraping shows accurate fields in sample, but then doesn't pull them into data table Help	0	893	January 10, 2019
How to use Data Scraping > Extract Wizard? Help	2	745	February 12, 2020
Data Scraping Wizard does not extract all selected data Help studio	3	2171	January 11, 2021
Data Scrapping Example Help	11	799	September 19, 2019

Most Active Users - Yesterday
ashokkarale
ppr
Anil_G
Ajay_Mishra
Yoichi
mhaniff
Shiva_Nikhil
Anonymouss
quick_123
vrdabberu
More details...

How to extract complete URL using Data Scrapping wizard

Related Topics