Data Scrap Google News and Edit XML for reliability

Hello Genius people,

I’m trying to scrap data from Google news, where I’m extracting Name, URL & Summary of the search data.

For a while it was working like a charm but now I realized the xml is not reliable and I need to edit them a bit.

image

Can anyone help with this to edit this XML and please explain how it should be done.
RunCmdAsDiffUsers.xaml (9.6 KB)

the structure changed. Kindly note the g-card tag:

Give try by reconfiguring the extraction or try following extract xml

<extract>
	<row exact='1'>
		<webctrl tag='div' />
		<webctrl tag='g-card' idx='1' />
		<webctrl tag='div' idx='1' />
		<webctrl tag='div' idx='1' />
		<webctrl tag='a' idx='1' />
		<webctrl tag='div' idx='1' />
		<webctrl tag='div' idx='2' />
	</row>
	<column exact='1' name='Head' attr='text' name2='Url' attr2='href'>
		<webctrl tag='div' />
		<webctrl tag='g-card' idx='1' />
		<webctrl tag='div' idx='1' />
		<webctrl tag='div' idx='1' />
		<webctrl tag='a' idx='1' />
		<webctrl tag='div' idx='1' />
		<webctrl tag='div' idx='2' />
		<webctrl tag='div' idx='2' />
	</column>
	<column exact='1' name='Summary' attr='text'>
		<webctrl tag='div' />
		<webctrl tag='g-card' idx='1' />
		<webctrl tag='div' idx='1' />
		<webctrl tag='div' idx='1' />
		<webctrl tag='a' idx='1' />
		<webctrl tag='div' idx='1' />
		<webctrl tag='div' idx='2' />
		<webctrl tag='div' idx='3' />
	</column>
</extract>
1 Like

Thank you so much for your help. But if I can ask how I can recognize this g-tag? I couldn’t find this when I inspect the page.

just righ click a card and select inspect element. Within the Browser F12 webtools you can check the structure

grafik

1 Like