Downloading and scrapping data from multi PDF

Hi everyone,

I want to download all the pdf files from a website in one folder, then extract all the data from each pdf then reorganise the data table.

To do this, I try to download the files with Open BrowserClick the files → Clik Saved As → Click Registred. That work, but only for one file, I can’t automate the process for all the files.

For the data table, I use the snippet For Each Files in Folders who work pretty well, but I not able to scrape all the data from each PDF.

And for the third problem I don’t already try Something.

Sorry if its too obvious, but I’m new with RPA.
Thank you

Hello there,

I need some help

Thanks

Hi, for the first issue, I had similar case.
In my case, I had to download CSV files on a webpage one by one.

First, if you can open the folder with explorer, I recommend you to do that since it may be easier to copy files from explorer.

Anyway, in my case I couldn’t do that 'cause the website is not just showing folder.

What I did is, by using UI explorer, to check available selectors of the 1st, 2nd and the last CSV file, and I found they had incremental ID; 1st CSV has “ID=0” in the selector and 2nd has “ID=1”, for example.

Then created two new variables, one int “csvId” (default 0) and one string “csvSelector”.

Now you’re almost there.

I copied Selector of click activity for download file and then replace ID va with csvId and assisted it to csvSelector.


e.g.
Selector

  ""

csvSelector

  ""

Then put csvSelector to Selector of the click activity.

Then I added assign process to increment csvId (csvId = csvId + 1) and made loop of “click download 〜 increment csvId”.

This worked for my case.
Hope you can get any hints from my case.

see also

Thanks you for your help,

To do it easily, I juste use Data Scraping to download all the URL.

Then I open each URL in the list and download them one by one with a for each loop.

Finally , I use a ForEach loop in order to Data Scraping each PDF in the folder but i don’t know how to extract only the data (because the pdf is about 150 pages) or extract all the pdf in DataTable !

Any idea for that ?

Regards
Antoine

Hi,

I’m facing a liitle probleme, I can’t loop to download my pdf from the URL stock in the excel files.

Can someone take a look at my process ?
<aclass=“attachment"href=”//cdck-file-uploads-global.s3.dualstack.us-west-2.amazonaws.com/uipath/original/2X/b/be52b58298644aba43bf69ddf844937ba7e035ac.xaml">Example.xaml (13.1 KB)

Regards

There is no attachment.in your post.

Example.xaml (13.1 KB)

Sorry

Try this
Example.xaml (13.4 KB)

1 Like

It works really good, thank you

Best regards,
Antoine