I have a question about how to implement data extraction from a website, in particular from wikipedia.
I have a list of cities which I have to obtain certain information such as area, population, etc… .
The problem is that it depends on each city, this information is in a different row so automating it is difficult for me.
For example for this city the information is here.
<html app='firefox.exe' title='Paris - Wikipedia' />
<webctrl aaname='Paris Country Region Department Canton Subdivisions Government •*' tag='TABLE' />
<webctrl isleaf='1' tableRow='18' tag='TD' />
For another city the selector is this one.
<html app='firefox.exe' title='Berlin - Wikipedia' />
<webctrl aaname='Berlin Country State Government • Body • Governing Mayor Area[1]*' tag='TABLE' />
<webctrl isleaf='1' tableRow='16' tag='TD' />
I also tried to get the whole table independently from each city, and then filter it in the datatable, but I can’t and I don’t know why.
Otherwise with find children we have another option to work with.
However also have a look on mediawiki / Wikipedia API MediaWiki API help - Wikipedia as it has potential to get used as an alternate for content retrieval