Can you setup an exclude for the xml used for screen scraping?

LeftBrainCo · April 10, 2020, 8:15pm

Here is my question If I am using the below Extract Metadata in web scraping and I have two identical items getting picked up by Manufacturer. Is there a way I can set a tag to be a does not match option? I tired using

<webctrl tag!='i' />

But this did not work.

<extract>
<row exact='1'>
	<webctrl tag='tr' />
</row>
<column name='Manufacturer' attr='text' exact='1'>
	<webctrl tag='tr' />
	<webctrl tag='td' idx='1' />
	<webctrl tag='p' idx='1' />
</column>
<column name='Part Number' attr='text' exact='1'>
	<webctrl tag='tr' />
	<webctrl tag='td' idx='2' />
</column>

ppr · April 10, 2020, 9:58pm

@LeftBrainCo
Unfortunately we dont have more details from your case (Screenshots, structures, sample data) that we can use for solution ideas.

Lets assume we cannot exclude (a few days back I did a longer RnD on the Extract XML and found out a lot of restrictions).

With a selector using the information on tag=‘i’ as an additional column, maybe the retrieved information can be used to do later a subtraction from other data
Example:
P= Hello World
I= World
P-I: P.replace(I, “”)

LeftBrainCo · April 13, 2020, 12:47pm

That will not work unfortunately as it would cause major issues with certain types of data like numbers ect.(Example below) Currently in these situations I run a loop that eliminates every odd row (Or whatever pattern is created by the lack of the information I am seeking). Did you find some sort of official documentation on Extract XML? I provided all of the information needed for the question I asked. I am looking to learn more about using Extract XML, not a workaround for a problem that might not exist (Missing exclude control).

For example

ppr · April 13, 2020, 4:26pm

@LeftBrainCo

Short answer:

syntax for denying tags: NO
options on reliable extraction the information: YES

Longer answer below:

Did you find some sort of official documentation on Extract XML?

we do find more triggers on the miss of the extractMetadata XML instead of having an overview to the supported tags and attributes

A little time back I did a heavy RnD on exploring the possibilities and it found out that we can do only limited things that we otherwise can do with selectors (e.g. nav up for anchoring)

So in short:

a syntax for denying tags is not available or known.

And also I am not expecting that regex selectors will be supported, but you can by your own.

I am looking to learn more about using Extract XML, not a workaround for a problem that might not exist (Missing exclude control).

As we have to assume that another option /approach is needed,for information retrieval , we can look on other working options. and there are Options available

I provided all of the information needed for the question I asked.

Yes you did for the YES/NO Question - Is there a way I can set a tag to be a does not match option?
But as it is not available and you need to check other options, more details from your case are to known

That will not work unfortunately as it would cause major issues with certain types of data like numbers ect.(Example below) Currently in these situations I run a loop that eliminates every odd row (Or whatever pattern is created by the lack of the information I am seeking).

Just provide us details on what we can rely for solution develloping, instead of incomplete part information on what is not working

What we did in several cases in such scenarios was

grab the data with datascrapping and configured ExtractMateadata XML + plus on additional column.
iterate again over the datawebtable with a find children and collect the inner/outerhtml string
parse the string with the XML API/ metrhods and pullout the information as by requirement
store the extracted info on the forseen column

LeftBrainCo · April 14, 2020, 12:55pm

This is all I needed thanks!

system · April 17, 2020, 12:55pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data Scraping XML extractor - omit a tag Help studio , question	4	1184	June 25, 2020
ExtractData - more infor on ExtractMetadata Help	8	11547	March 24, 2021
Is There a Way To Define Class Elements For Extract Table Data Studio studio , question	5	849	August 1, 2021
Extract Structured Data - dynamic extract metadata Activities uiautomation , activities , data-scraping	5	998	January 31, 2023
Different websites but same process Help uiautomation , studio , question	2	708	January 9, 2020

Most Active Users - Yesterday
ashokkarale
ppr
Anil_G
Ajay_Mishra
Yoichi
mhaniff
Shiva_Nikhil
Anonymouss
quick_123
vrdabberu
More details...

Can you setup an exclude for the xml used for screen scraping?

Related Topics