Data Scraping Challenge: Extracting Information from 'View Hours' Card

I’m attempting to scrape data from a website that features a card labeled “View Hours.” Within this card, there is certain information that cannot be individually selected. I initially tried using the “Extract Table Data” activity, but the target was not selected as expected. Subsequently, I employed the “Get Text” activity, which does retrieve the necessary information. And once the information is extracted, it should be stored in an ordered manner.

I have attached the files and provided sample screenshots for reference.
Sequence4.xaml (19.1 KB)

This is a screenshot and a link to the website from which I attempted to scrape data.
https://findaprovider.wellcare.com/search-results

This is a screenshot of the ‘View Hours’ card.

This is a sample screenshot of the desired output.

Hi @NikhilR

The workflow file you have attached shows document is invalid. Could you share the file again please.

Regards,

Here is the valid workflow file.
View Hours Sequence.xaml (20.4 KB)
This is the complete zip file.
View Hours Srcaping Data.zip (6.8 KB)

Hi @NikhilR

The workflow file you have attached shows document is invalid again.

Regards

This is the complete zip file.
View Hours Srcaping Data.zip (6.8 KB)

Hi @NikhilR

I below page is being opened with the URL you provided. Is it correct?
https://findaprovider.wellcare.com/search-results

Regards

1 Like

@NikhilR
Refer to the webpage code, the id tag is stable so that you may use 7 get text activities to retrieve the office hour. Then write them into Excel.


image

Thanks for the guidance. I already tried using the ‘Get Text’ activity, but it didn’t work. Here is the workflow file using the ‘Get Text’ activity.
View Hours Sequence.xaml (28.9 KB)

Yes, this is the correct website.

Would you attached the project.json file as well? I can’t open your xaml file due to dependency issue.

View Hours.zip (7.7 KB)

Hi NikhilR,

  1. Not sure of the purpose of for each. I placed another for each to loop the 1st to 10th ‘View hours’.
  2. For the get text activities, use ‘Strict selector’ is more stable and secure.
    image
  3. The index of page Office hours will be increased per opening so that I use wildcard.
    image

Attached is the zip file for your reference. The result will be appended to file result.txt. You may replace it by writing to Excel.
View Hours_v2.zip (7.0 KB)

Thank you for the solution; it’s working.
However, when I tried to store the scraped data in the Google Sheets web version, it only saves the last piece of data. Additionally, when using an enumerated range from 1 to 500 or 1000, it’s only retrieving 25 or 40 records. I need to extract more than 1000 records. Could you please assist in resolving this issue?

Further study on the web page, make some changes as below.

  1. Check the ‘View hours’ exists or not in the md card first (beware of the index). If it exists, then get the office hours. If not, mark something (such as, ‘Contact Provider for office hour’).
  2. Add a hotkey ‘Page down’ to refresh the webpage to ensure the next record is shown.
    View Hours_v3.zip (7.8 KB)

For storing data in Google Sheets, I don’t have much experience with it. You may search for some related topics in the forum.

1 Like

@healsko_ho Thanks a lot for the guidance.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.