How to pass innerhtml attribute in Browser Scrapping?

I am trying to pick Github Stars from Data Scrapping as shown in the image below:

What I realized is that there’s no difference between the Stars and forking icon (idx also changes when a specific repository doesn’t have stars but just forks) except for their innerhtml attribute in property explorer:

Where we can see aria-label=‘star’ which is different for the fork:

Unfortunately, I am not allowed to set `inner HTML property on the selector, Can someone help me in achieving the same?

Here’s the link to my repostory: sedhha (sedhha) / Repositories · GitHub

I want to use data scrappnig to scrap repositoryName, primaryLanguage(if it exists),github Stars (if they exist), and github Forks (if they Exist). I am getting stuck in making distniction between github Stars and github Forks

Here’s How a single repository card looks (note that max number of entries I want to scrap is < 1000 and there’s a next button at the end to go to the next page):

Did you tried get attribute… and it is empty?

Precisely, I don’t want to get the attribute, rather I want to filter those points where inner attribute contains aria-label='star'

Hi,

If you want to create table for the page , can you try to use the following settings?

<extract>
<row exact='1'>
	<webctrl tag='li' />
	<webctrl tag='div' idx='1' />
</row>
<column exact='1' name='Column1' attr='text'>
	<webctrl tag='li' />
	<webctrl tag='div' idx='1' />
	<webctrl tag='div' idx='1' />
	<webctrl tag='h3' idx='1' />
	<webctrl tag='a' idx='1' />
</column>
<column exact='1' name='Column2' attr='text'>
	<webctrl tag='li' />
	<webctrl tag='div' idx='1' />
	<webctrl tag='div' text='Updated' idx='1' />
	<webctrl tag='a' idx='1' />
</column>
<column exact='1' name='Column3' attr='href'>
	<webctrl tag='li' />
	<webctrl tag='div' idx='1' />
	<webctrl tag='div' text='Updated' idx='1' />
	<webctrl tag='a' idx='1' />
</column>
<column exact='1' name='Column4' attr='text'>
	<webctrl tag='li' />
	<webctrl tag='div' idx='1' />
	<webctrl tag='div' text='Updated' idx='1' />
	<webctrl tag='a' idx='2' />
</column>
<column exact='1' name='Column5' attr='href'>
	<webctrl tag='li' />
	<webctrl tag='div' idx='1' />
	<webctrl tag='div' text='Updated' idx='1' />
	<webctrl tag='a' idx='2' />
</column>

This returns the following.

We can identify each value is star or fork, from URL ends with stargazers or members, then restructure the table.

Regards,

I didn’t get you. Can you share the xaml, also can you explain how it handles the following cases:

  1. When the Repo is only starred should return n stars and 0 forks
  2. When the Repo is only forked should return 0 stars and n forks
  3. When the Repo is both starred and forked should return n stars and m forks

Do you mean when it’s stargazers both values are present (it’s the Column2 which is star and the Column3 which is fork), and when it’s not stargazers only one is present (where if Column2 is present it’s stars and Column3 is forks)?

Btw can you please share the xaml?

Hi,

Can you try the following sample?

Sample20211206-4.zip (4.0 KB)

Regards,

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.