I am trying to extract all the reviews for a product along with the user name, star rating, date and location, color and specifications, review topic, the review itself and helpfulness votes from Amazon. I am using the data scraping activity and getting all the data by selecting “extract correlated data”. Amazon shows that they have xyz number of reviews but my tool is collecting less than that. For example for this product:
amazon says that there should be 17878 reviews but my automation only collects 4518. I have also kept 1 million as my upper bound for data scraping.
Second problem is that towards the ending the data is not being scraped properly from the website and is missing first three columns and last column.
Have attached my resulting excel file and code.
what my code does:
Opens an excel file which has the ASIN ID(product ID) and Link to product
Goes to the review page for each product
then extracts the reviews data
writes to the excel file, creates a sheet for each product