How to retrieve specific data from HTML

jferre · July 6, 2023, 11:55am

Hi guys,

I just give up with this case. I need your support. Let’s see if I am able to explain it.

The use case is based on e-mails reaching a shared mailbox. The body of those mails is as follows:

Each line contains information that has to be retrieved and introduced in a web page afterwards.

My problem is that, the body of those mails comes in HTML format and I haven’t been able to extract the data in any of the methods I can imagine. Here’s the “insides” of the HTML file. I marked in yellow the relative tag of every data I want to retrieve (in green):

I tried several methods (get full text, screen scrapping…) and I’m not able to get a structured entity from where I could start working. If Regex is the best solution here, I need your knowledge and expertise as I’m not that good using Regex.

Thanks everyone for your suggestions and, please, let me know if something is unclear.

supriya117 · July 6, 2023, 11:58am

Hi @jferre

Open the file using use application/browser activity then extract the data from it.

Regards,

jferre · July 6, 2023, 12:04pm

Hi @supriya117 ,

I already tried that. And I got no text as result. I surely am doing things wrong so, I’d appreciate it someone could bring some light here .

supriya117 · July 6, 2023, 12:08pm

@jferre

Save the outlook mail in “*.mht” format by using save outlook mail activity then open it in the browser.

jferre · July 6, 2023, 12:10pm

I am not using the outlook client at all. Everything is done by using Exchange activities.

And I would prefer to not use Outlook if avoidable.

ppr · July 6, 2023, 12:13pm

lets assume the email is sent in HTML Body format

We do see
myMailVar.BodyAsHtml - getting it as HTML
myMailVar.Body - getting it as Text

Then we can extract the values e.g. with a regex

Check at your end by doing the following

set a breakpoint after retrieving the email
debug and get paused
use immediate panel: yourMailVar.Body
Understanding the 6 Debugging Panels of UiPath in the easiest way possible! - News / Tutorials - UiPath Community Forum

Samples:
grafik
grafik

jferre · July 6, 2023, 12:17pm

Hi @ppr ,

Thanks for your suggestion. I already was able to save the body of the mail in a text file. My problem comes just afterwards. I don’t know how to get rid off all the rubish (unwanted tags and symbols) and leave only the data I need.

If some Regex ‘guru’ could lend a hand here I would really appreciate it.

ppr · July 6, 2023, 12:21pm

as shown above we can get the text only. Can you share with us what was done at your end for the body text retrieval? Thanks

jferre · July 6, 2023, 12:28pm

Sure. Basicallly I get all the e-mails in scope with a ‘Get Exchange Mail Message’ activity and I store them in a list of Mail Messages.

Then, I do a For Each and I process each mail on that list. I can read the body information with item.Body.ToString assignment.

So, the last step is to save that information with a ‘Write Text File’ activity. And now I have a plain-text file (which I can save as *.txt or *.html) containing the information I mentioned in my first post.

I need to extract the information from this file. And there’s where I am stucked with.

ppr · July 6, 2023, 12:31pm

item.Body is similar on what we had done in the immediate panel, but got plain text only.
Please do it at your end similar and share the screenshot with us thanks

jferre · July 6, 2023, 12:48pm

I can’t see the difference between item.Body and item.Body.ToString. I guess it’s exactly the same and, I insist on the fact that that’s not the real issue.

My need here is to extract the information from the plain text file which contains the HTML code. And I guess that’s a topic for Regex.

ppr · July 6, 2023, 12:52pm

ok, not shared but are stating that Body is returning also HTML tagged text.

If so, we can check on yourMailVar.Headers(“PlainText”) for getting the text only

Otherwise we can use https://html-agility-pack.net for the help as well

Topic		Replies	Views
Extracted HTML email body does not repproduce original mail Help mail	1	1669	May 27, 2019
How to get a mail body in html format (without using get exchange mail activity) Help	3	3095	December 30, 2019
Got Email body with HTML Format Studio mail , outlook , studio	1	2343	November 11, 2020
Extracting data table from e-mail message body Help	15	10673	January 19, 2021
Read Email Body that is an attachment to another email Studio outlook , email , html , attachment	1	1139	May 29, 2022

Most Active Users - Yesterday
sonaliaggarwal47
chandreshsinh.jadeja
shinji
pradeep_ch
Nitesh
Sep-Dana
sharazkm32
Akhil_Raveendran
sven.wullum1
johannes.reitermayer
More details...

How to retrieve specific data from HTML

Related topics