Webscraping from HUDOC website

Dear all,

I have been trying to extract data from https://hudoc.echr.coe.int to put it in a spreadsheet. These concerns case law of the European Court of Human Rights. I want to do this for each case until a date chosen by a user.

Unfortunately, it is impossible to get the retrieve the HTML source from this website as it is mostly javascript.
I have tried Data Scraping, but all I have managed to do is extract the data and write it in a text file. My idea was to use regular expressions / Match & Replace to get all metadata and write it in a spreadsheet, but I think this is rather inefficient.

I prefer to use Data scraping for this purpose, but when I do this, not all of the data will get scraped.

An example of an item I would like to extract can be found on this page. (click Case Details)

https://hudoc.echr.coe.int/eng#{"documentcollectionid2":["GRANDCHAMBER","CHAMBER"],"itemid":["001-186048"]}

I did not add the entire process in as an attachment, only the part in which I am retrieving the text of the webpage.

I hope I made clear what I am trying to achieve. If someone has a good idea for an approach, I would be very happy to hear from you :slight_smile:

Main.xaml (31.5 KB)

I am not clear what is the exact requirement ?

I think I should start from scratch.
I would like to download all caselaw (PDF format) and metadata from the HUDOC website to store it on my computer. My aim is to create a spreadsheet that list the URL of each case along with the corresponding metadata.

I made an example to illustrate this.Example.xlsx (8.5 KB)