I’m trying to extract the description in a website from html code through Orchestrator HTTP activity, but I could not extract the full description of that site.
I’m able to extract only the first paragraph from the website, I need to capture everything under div class
Can anyone help me with this?
Hi @deepankar ,
Thanks for reaching out to UiPath community.
You can use Regular Expressions (Regex) to extract the text under the
Try to follow these steps:
- Use the Orchestrator HTTP activity to retrieve the HTML code of the website. The output of this activity should be a string variable containing the HTML code.
- Add an Assign activity to create a new string variable and assign it the value of the HTML code from the previous step. For example, let’s call this variable
- Use Regex to extract the text under the
div class. Here’s an example Regex pattern that will extract all the text under a
div class with the class name “description”:
- Add another Assign activity to create a new string variable and assign it the value of the text extracted by the Regex pattern. For example, let’s call this variable ‘description’.
You can use this regex in the flow for gettng results.
This activity is about calling Orchestrator Rest Api endpoints
e.g. extracting the inner text of the section element we can do:
- using GET TEXT Activity
- using get Attribute Activity
- As we do have HTML Code, also parsing the HTML (retrieved e.g by Attribute outerhtml) could help. For this, we would use the following:
Hello @Palaniyappan @pratik.maskar
From HTTP Request activity I’m getting only some text.
Please find the attached screenshot for reference
I’m getting only 10% of the content when used this activity
Please find the attached screenshot for your reference
it could be the case that the preview is cutting off after some text length. However what is motivating you to use Http request instead of Use Browser/get text/get attribute as mentioned above? From the description it did sound, that you are only interested on a part of the page
Here we need to extract the description of that webpage without hitting the url from the page list.
We will open a main web application and search with a keyword and from that web application we will be getting the list of URL’s for that particular search, now I need to extract the description of all the URL’s in the list without hitting any URL, its like I’m reducing the hits in the web application and trying to extract everything in a single task.
Hope you got it.
sounds like you want to grab the description witout using use browser / get XX
- HTTP REQUEST, grabing the page content & processing with the above:
could do it.
Keep in mind: There is also a call to Web/Application Server offering the URL when using HTTP REQUEST and will create request and response traffic similar to Use Browser / get XX activity