Webscraping from HUDOC website

trabart · September 17, 2018, 12:40pm

Dear all,

I have been trying to extract data from https://hudoc.echr.coe.int to put it in a spreadsheet. These concerns case law of the European Court of Human Rights. I want to do this for each case until a date chosen by a user.

Unfortunately, it is impossible to get the retrieve the HTML source from this website as it is mostly javascript.
I have tried Data Scraping, but all I have managed to do is extract the data and write it in a text file. My idea was to use regular expressions / Match & Replace to get all metadata and write it in a spreadsheet, but I think this is rather inefficient.

I prefer to use Data scraping for this purpose, but when I do this, not all of the data will get scraped.

An example of an item I would like to extract can be found on this page. (click Case Details)

I did not add the entire process in as an attachment, only the part in which I am retrieving the text of the webpage.

I hope I made clear what I am trying to achieve. If someone has a good idea for an approach, I would be very happy to hear from you

Main.xaml (31.5 KB)

skini76 · September 18, 2018, 4:15am

I am not clear what is the exact requirement ?

trabart · September 18, 2018, 9:10am

I think I should start from scratch.
I would like to download all caselaw (PDF format) and metadata from the HUDOC website to store it on my computer. My aim is to create a spreadsheet that list the URL of each case along with the corresponding metadata.

I made an example to illustrate this.Example.xlsx (8.5 KB)

Topic		Replies	Views
Web data scraping issue Help studio	0	850	June 28, 2018
Data_Extraction from a website Help browser , activities , data_scraping , web , question	6	929	December 5, 2019
Data Scrape No Data Help	9	1154	November 7, 2018
How to get values from a web form to excel Studio studio , question , tools	5	1071	December 20, 2021
Data Scraping not working in this Website Academy Feedback excel , uiautomation , activities	3	866	August 20, 2020

Most Active Users - Yesterday
ashokkarale
Anil_G
Yoichi
yangyq10
postwick
chandreshsinh.jadeja
aravindbalineni123
Parvathy
aya
PRASHANT_GABHANE
More details...

Webscraping from HUDOC website

Related Topics