How to identify and deal with network noise on the page

Hello everyone,

The text content of some site information contains network noise, the example as follow.

<span style='font-family: myfont;'>&#xed5e;</span>

They start with “&#” and have five letters or numbers to represent a word.
Both Get text and Get Full Text can not scrape the correct results.
Even the innerHtml from the Get Attribute doesn’t get the original HTML code,
If it works, at least I can replace it with a reference table.
Appreciate your advice. Thanks in advance!

Hi @wusiyangjia ,

Try with Ui Explorer and see the selector which gives the required value and select those selector.

Hi @manjula_rajendran

Thank you for your answer.
I have tried many method to get original HTML code include “&#” with five letters or numbers.
Unfortunately, Including any content in UI Explorer, It looks like no method can get, because they already be converted.

Hi @wusiyangjia ,

Is it possible to share the url from where you are scraping. I can try from my end.

Hi @manjula_rajendran

Maybe like this.
TestPage.html (99 Bytes)
Thank you for your help!

Hi @wusiyangjia ,

I tried with get text activity and i’m getting the text “欢迎来到Siyang的UiPath博客!”

“webctrl tag=‘SPAN’ aaname=’*’ /”

Hi @manjula_rajendran

Sorry, I just provided a HTML including “&#”, and want to get original HTML code.
maybe you can check this:
https://k.autohome.com.cn/detail/view_01f5rdcpxh6cv36e9j6ms00000.html#pvareaid=2112108
For example:

【最满意】<br>最满意<span style='font-family:myfont'>&#xeddd;</span>动力很好

It is impossible to get text from specific “&#” content.

Hi @wusiyangjia ,

what line you are trying to scrape on the website? Change the language to English and give me the word you are trying to get.

I tried to get some text from website and i got output like below
image

Is this you are looking?

Hi @manjula_rajendran

Yes, As your output, some words can not get correctly,
for example, the word right before “不列”,or right after “片拍”,
I need to deal with them later, so I want to get original HTML code,
just as:

<span style='font-family:myfont'>&#xeddd;</span>不列