Can i scrape data on this web page?

Hi eveyone.
I’m try to do “Data Scraping” bellow web site’s pages.
https://www.epo.org/searching-for-patents/data/coverage/legal-status.html
In this page, US , AU, NZ and FR are work fine.



But in other pages(like EP), i can not get the data as expected.

Are these pages out of the scope for scraping?

Please someone advise me!
Get_Legal-status-codes-US.zip (152.6 KB)
Get_Legal-status-codes-EP.zip (235.7 KB)

Hi,

There is Code column which consists of 3 columns in the table of EP, and it causes the Data Scraping wizard fails to get data correctly by choosing whole table.
You can get it by choosing not whole table but each column one by one in the wizard as the following image.

Regards,

2 Likes

Hi @patent-atanaka,

Please use the below, table definition: Use Edit table definition and replace the below code.

<extract>
<row exact='1'>
	<webctrl tag='tr'/>
</row>
<column exact='1' name='Column1' attr='text'>
	<webctrl tag='tr'/>
	<webctrl tag='td' idx='1'/>
	<webctrl tag='p' idx='1'/>
	<webctrl tag='b' idx='1'/>
</column>
<column exact='1' name='Column2' attr='text'>
	<webctrl tag='tr'/>
	<webctrl tag='td' idx='2'/>
</column>
<column exact='1' name='Column3' attr='text'>
	<webctrl tag='tr'/>
	<webctrl tag='td' idx='3'/>
</column>
<column exact='1' name='Column4' attr='text'>
	<webctrl tag='tr'/>
	<webctrl tag='td' idx='4'/>
</column>
<column exact='1' name='Column5' attr='text'>
	<webctrl tag='tr'/>
	<webctrl tag='td' idx='5'/>
</column>
<column exact='1' name='Column6' attr='text'>
	<webctrl tag='tr'/>
	<webctrl tag='td' idx='6'/>
</column>
<column exact='1' name='Column7' attr='text'>
	<webctrl tag='tr'/>
	<webctrl tag='td' idx='7'/>
</column>

Regards,
Sasikumar K

1 Like

Thank advice, Yoichi!

Thanks advice, Sasikumar!

I make a manual “Step by Step” about this topic with reference to valuable advices. Instruction step by step using DataScraping.pdf (1.2 MB)
Get_Legal-status-codes-EP.zip (416.3 KB)

1 Like