Screen Scrapping is not working when web page changed

Hello Guys,

when I try to use screen scraping to get prices from a web page after feeling information the page each time changed , so ExtractMetaData property changed also , bellow the XML of each page :

<extract>
	<column exact='1' name='Column1' attr='text'>
		<webctrl class='results-list__wrapper' idx='1' tag='div' />
		<webctrl class='is-enabled' tag='quote-row' />
		<webctrl class='result coverageType--ThirdParty' idx='1' tag='div' />
		<webctrl class='result__body' idx='1' tag='div' />
		<webctrl class='result__price price price--global is-visible' idx='1' tag='div' />
		<webctrl class='price__amount price__amount--primary' idx='1' tag='div' />
	</column>
 </extract>

<extract>
     <column exact='1' name='Column1' attr='text'>
        		<webctrl class='results-list__wrapper results-list__wrapper--narrow' tag='div' idx='1' />
		<webctrl tag='quote-row' />
		<webctrl class='result-minimalist ThirdParty' tag='div' idx='1' />
		<webctrl class='result-minimalist__body' tag='div' idx='1' />
		<webctrl class='result__cta result__cta--minimalist' tag='div' idx='1' />
		<webctrl class='result__omni-cta' tag='div' idx='1' />
		<webctrl class='btn-wrapper' tag='div' idx='1' />
		<webctrl tag='mer-subscription' idx='1' />
		<webctrl class='btn btn--primary' tag='button' idx='1' />
		<webctrl class='price' tag='div' idx='1' />
	</column>
</extract>

Any idea about how to fix this issue??

Best Regards.
Hajar.

3 Likes

Hi @hhajar
Try to check your selector if it is doing well.

cheers :smiley:

Happy learning :smiley:

3 Likes

Hi @pattyricarte ,
Thanks for replying,
I already checked my selector , it’s the same for the 2 pages .

2 Likes

The class name for page 1 is results-list__wrapper, and for page 2 it is results-list__wrapper results-list__wrapper–narrow.

Are you sure that the selectors for both pages are not using the full names?

if you have a selector that contains classname = “results-list__wrapper*”, then it will work for both pages.

2 Likes

I mean the ExtractMetaData is different from each page , as Xml showing

1 Like

But that is expected isn’t it? Each of your pages have different class names, and if you scrape everything, the data structures will change.
However, if you scrape only the ones that are common between the two pages, then the ExtractDataTable structure will remain the same.

1 Like

But how to scrape only the ones that are common between the two pages??

1 Like

Did you follow the training on Data Scraping Wizard?

image

2 Likes

Yes I’ve follewed the wizard step by step

1 Like