I’m a beginner in RPA (almost finished Foundation Training Level 1) trying to scrape data from a real estate website (Price, address and URL) and paste it in an excel file: realestate.com.au
Below are the steps I undertook:
- Used Data scraping wizard in order to get the price, address and URL of each ad.
- The wizard is not able to retrieve the address, so I focused on Price and URL only and will try to get the address via another way later
- I used the “going to next page” option but noticed that the URL was only captured for the first page (I read on another thread that this could be due to the fact that the images do not load at the same pace than the web page – but I tried with a delay of 50 secs and still not work…)
- So, I built a loop in order to avoid using the “going to next page” option with the wizard. It works well to a certain extend:
- I have quite a lot of duplicates in my excel file, eventhough I’m using a clear database activity.
- Once the duplicates are removed, I notice the 3 first pages (out of 8) are perfectly retrieved by the robot (result 1 to 75).
- But then come the issues: There is no data at all from page 4. There is only partial data from page 5 to page 8. Now, the format of the data is changing at the end of page 5 which I guess confuse the robot. But I would have expected the robot not to be able to retrieve anything from the last pages, not partial data.
If someone could have a look at my workflow to help me understand what makes the process going wrong after the first three pages, it would be greatly appreciated!
In order to reproduce my research you can input the following cities in Rent: richmond, vic 3121; balaclava, vic 3183; brighton, vic 3186; port melbourne, vic 3207; elsternwick, vic 3185; prahran, vic 3181; prahran east, vic 3181; st kilda, vic 3182; st kilda east, vic 3183; st kilda west, vic 3182; east melbourne, vic 3002; south melbourne, vic 3205; elwood, vic 3184; cremorne, vic 3121;
Sorry but I did not automate the robot to scroll down, so you will have to scroll down each page in order to see the “Next button” on which I based my “Image exists” and “Click” activities to continue the loop.Real Estate project.xaml (83.7 KB)
Thanks in advance for your advice, let me know if I can clarify any point further.