Downloading and scrapping data from multi PDF


#1

Hi everyone,

I want to download all the pdf files from a website in one folder, then extract all the data from each pdf then reorganise the data table.

To do this, I try to download the files with Open Browser -> Click the files -> Clik Saved As -> Click Registred. That work, but only for one file, I can’t automate the process for all the files.

For the data table, I use the snippet For Each Files in Folders who work pretty well, but I not able to scrape all the data from each PDF.

And for the third problem I don’t already try Something.

Sorry if its too obvious, but I’m new with RPA.
Thank you


#2

Hello there,

I need some help

Thanks


#3

Hi, for the first issue, I had similar case.
In my case, I had to download CSV files on a webpage one by one.

First, if you can open the folder with explorer, I recommend you to do that since it may be easier to copy files from explorer.

Anyway, in my case I couldn’t do that 'cause the website is not just showing folder.

What I did is, by using UI explorer, to check available selectors of the 1st, 2nd and the last CSV file, and I found they had incremental ID; 1st CSV has “ID=0” in the selector and 2nd has “ID=1”, for example.

Then created two new variables, one int “csvId” (default 0) and one string “csvSelector”.

Now you’re almost there.

I copied Selector of click activity for download file and then replace ID va with csvId and assisted it to csvSelector.


e.g.
Selector

  ""

csvSelector

  “”

Then put csvSelector to Selector of the click activity.

Then I added assign process to increment csvId (csvId = csvId + 1) and made loop of “click download 〜 increment csvId”.

This worked for my case.
Hope you can get any hints from my case.


#4

see also


#5

Thanks you for your help,

To do it easily, I juste use Data Scraping to download all the URL.

Then I open each URL in the list and download them one by one with a for each loop.

Finally , I use a ForEach loop in order to Data Scraping each PDF in the folder but i don’t know how to extract only the data (because the pdf is about 150 pages) or extract all the pdf in DataTable !

Any idea for that ?

Regards
Antoine


#6

Hi,

I’m facing a liitle probleme, I can’t loop to download my pdf from the URL stock in the excel files.

Can someone take a look at my process ?
<aclass=“attachment"href=”/uploads/uipath/original/2X/b/be52b58298644aba43bf69ddf844937ba7e035ac.xaml">Example.xaml (13.1 KB)

Regards


#7

There is no attachment.in your post.


#8

Example.xaml (13.1 KB)

Sorry


#9

Try this
Example.xaml (13.4 KB)


#10

It works really good, thank you

Best regards,
Antoine