Data Scraping inside DOM / Shadow DOM - help!

Hi all,

I’ve hit an issue with data scraping a website.

image
In the attached screen grab of the html, I need to get the order number (0852/1010 in this example). Unfortunately data scraping refuses to grab the ‘text’ attr from the tag-text element. I’m assumming this is because the text itself is inside a tag, rather than between the fields themselves (as if I manually add the text between the angle brackets it extracts the text).
image

The same information can be grabbed further down the DOM inside a shadow dom (under the span), but I can’t work out how to get into the shadow dom.

Does anyone know 1) why it simply refuses the grab the text attr from the original tag-text element, 2) if there’s anyway to force it to do so, or 3) if I can grab it from the shadow dom somehow instead?

give a try on take it from the span with the class “accent–title” by manually editing the extract metadata xml
or give a try to fetch higher level e.g. td idx=‘2’

Hi,

You might have more success with the following (you’ll get the idea):

<extract>
  <column exact="1" name="OrderNumber" attr="text">
    <webctrl tag="tbody" idx="1"/>
    <webctrl tag="tr" class="selectable*"/>
    <webctrl tag="td" idx="2"/>
    <webctrl tag="tag-text" idx="1"/>
  </column>
</extract>