ExtractData From Wikipedia Infobox Table

Hi everybody,

I have a question about how to implement data extraction from a website, in particular from wikipedia.

I have a list of cities which I have to obtain certain information such as area, population, etc…
image .
The problem is that it depends on each city, this information is in a different row so automating it is difficult for me.
For example for this city the information is here.

<html app='firefox.exe' title='Paris - Wikipedia' />
<webctrl aaname='Paris Country Region Department Canton Subdivisions Government •*' tag='TABLE' />
<webctrl isleaf='1' tableRow='18' tag='TD' />

For another city the selector is this one.

<html app='firefox.exe' title='Berlin - Wikipedia' />
<webctrl aaname='Berlin Country State Government • Body • Governing Mayor Area[1]*' tag='TABLE' />
<webctrl isleaf='1' tableRow='16' tag='TD' />

I also tried to get the whole table independently from each city, and then filter it in the datatable, but I can’t and I don’t know why.

Thank you very much.

@JavierSanchis
Welcome to the forum

with some techniques we can retrieve some data. I did only a quick RnD:

with this extractMetadata

<extract>
	<row exact="1">
		<webctrl tag="tbody" idx="1"/>
		<webctrl tag="tr"/>
	</row>
	<column exact="1" name="Column1" attr="text">
		<webctrl tag="tbody" idx="1"/>
		<webctrl tag="tr"/>
		<webctrl tag="th" idx="1"/>
	</column>
	<column exact="1" name="Column2" attr="text">
		<webctrl tag="tbody" idx="1"/>
		<webctrl tag="tr"/>
		<webctrl tag="td" idx="1"/>
	</column>
</extract>

Otherwise with find children we have another option to work with.

However also have a look on mediawiki / Wikipedia API https://en.wikipedia.org/w/api.php as it has potential to get used as an alternate for content retrieval

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.