hi,
i want to extract comments from amazon.in website. i did a workflow which will extract all reviews from amazon and insert the comments in database.
i did data scraping and extract title,posted_by,post_date,rating,review. and one add data column of product_id.
this is my result output.
but everyday all comments extraction using data scraping is time taking process.so i want to extract only new comments which are not present in data base. how can i do this?
or which will be the best approach to extract new comments?
Thank u.
@arijit1213
there is no workflow as I just checked a conceptual approach for the retrieval.
But we can help you on setting up the building blocks e.g. find duplicates etc. in case of you need further help
i setting up the reviews on most recent.
after that what can i do? can you explain how can i grap the review count
and define a threshold which fork on all reviews datascrapping or difference scraping
this two points?
so i want to scrap the first page then i want to check the whole scrap data from 1st page is present in data base or not. if all data from first page is present in database then close the tab. if one or more unmatched row found in the first page then i want to go for next page and scrap the second page and check the scraped data from second page is present in data base or not. And the same process i want to repeat. how can i do this?
Thank u