I'm trying to extract the description in a website from html code through Orchestrator HTTP activity, but I could not extract the full description of that site

Hello,

I’m trying to extract the description in a website from html code through Orchestrator HTTP activity, but I could not extract the full description of that site.

Reference Image

I’m able to extract only the first paragraph from the website, I need to capture everything under div class

Can anyone help me with this?

Hi @deepankar ,
Thanks for reaching out to UiPath community.
You can use Regular Expressions (Regex) to extract the text under the div class.

Try to follow these steps:

  • Use the Orchestrator HTTP activity to retrieve the HTML code of the website. The output of this activity should be a string variable containing the HTML code.
  • Add an Assign activity to create a new string variable and assign it the value of the HTML code from the previous step. For example, let’s call this variable htmlCode.
  • Use Regex to extract the text under the div class. Here’s an example Regex pattern that will extract all the text under a div class with the class name “description”:
  • Add another Assign activity to create a new string variable and assign it the value of the text extracted by the Regex pattern. For example, let’s call this variable ‘description’.

You can use this regex in the flow for gettng results.
(?<=

)(.*?)(?=</div>)

Regards,
@pratik.maskar

This activity is about calling Orchestrator Rest Api endpoints

e.g. extracting the inner text of the section element we can do:

  • using GET TEXT Activity
  • using get Attribute Activity
  • As we do have HTML Code, also parsing the HTML (retrieved e.g by Attribute outerhtml) could help. For this, we would use the following:
    https://html-agility-pack.net/

Hello @Palaniyappan @pratik.maskar

From HTTP Request activity I’m getting only some text.
Please find the attached screenshot for reference

I’m getting only 10% of the content when used this activity

Please find the attached screenshot for your reference

it could be the case that the preview is cutting off after some text length. However what is motivating you to use Http request instead of Use Browser/get text/get attribute as mentioned above? From the description it did sound, that you are only interested on a part of the page

Here we need to extract the description of that webpage without hitting the url from the page list.

Scenario:
We will open a main web application and search with a keyword and from that web application we will be getting the list of URL’s for that particular search, now I need to extract the description of all the URL’s in the list without hitting any URL, its like I’m reducing the hits in the web application and trying to extract everything in a single task.

Hope you got it.

sounds like you want to grab the description witout using use browser / get XX

  • HTTP REQUEST, grabing the page content & processing with the above:

could do it.

Keep in mind: There is also a call to Web/Application Server offering the URL when using HTTP REQUEST and will create request and response traffic similar to Use Browser / get XX activity