some time ago I created with the help of suggestions from forums, a bot that every day checks inside a cloud web if there are new documents, or if one of the old ones has been updated.
The BOT connects to the website.
Scrapes the list that contains:
document name | group | last modification date.
The file is saved on the network using all these details.
The bot performs a Path Exist to find out if the file and folder exist with that name.
If they exist, proceed to the next line.
If they don’t exist:
- create the corresponding folder (named after the group)
- save the document “filename + last modification date + .pdf”.
The bot is getting slow and generating errors, I don’t know for what problem.
I’m looking for an alternative way to back up only files that have changed since the last save.
- I enter the website and scrape the list. (NewScraping.xlsx)
- Retrieve yesterday’s back list, (OldScraping.xlsx)
- Create a table of modified files.
- Save intercepted files on the network.
- I replace OldScraping with NewScraping.
Anyone know how to match the two files and find the differences?
The list of files can increase therefore the search for each row must take place on the whole sheet, and not row 1 with row1 … etc etc.