Extract specific text from the html body string (from the HTTP request)

Hi,
I am new to Uipath. Could anyone please help with this?
I use Request HTTP to get the response body, and it seems a string when I saved it as a variable.
I am trying to extract the address from the string, but I am not good at this.

For example, here below is what I want to get:
38, AVENUE JOHN F. KENNEDY L-1855 LUXEMBOURG
(The address can be different every time, but the page structure is always same.)

Here below is part of the html body string:
"…

Date when request received 2021/02/02 20:51:47
			<tr>
   				<td class="labelStyle">Name</td> 
   				<td>AMAZON EU SARL
   			</tr>
		 
		 
		 
		
			<tr>
   				<td class="labelStyle">Address</td> 
   				<td>38, AVENUE JOHN F. KENNEDY<br />L-1855  LUXEMBOURG
			<tr>
   				<td class="labelStyle">Consultation Number</td> 
   				<td></td>
   			</tr>
		
   	</table>
   	<br />
   	<p><a href="vatRequest.html">Back</a></p>
</fieldset>

…"

Thanks.

@MiaS - Could you please check this…This is using the Regex Method

Group 1 and Group3 will get you the desired output.

Please find the starter help here: Regex_Mia.zip (34.9 KB)

Please refer this post to learn more about Regular expressions…

2 Likes

Wow. It works. Thanks so much for the explanation and xaml file!
Also thanks for the post link for Regular expressions, I will keep learning.
Have a good night!

1 Like

@MiaS - Glad to know…Please mark my post as solution and that will close this thread and help others in finding the solution easily. Thanks

1 Like

Hi, I just realise that actually there is slight difference between different addresses…

For example, there is no <br /> in the middle for below address:

<tr>
	   				<td class="labelStyle">Address</td> 
	   				<td>3RD FLOOR, GORDON HOUSE, BARROW STREET, DUBLIN 4
</td>

Could you please advise how can I modify the code to adapt both situations? Thanks!

@MiaS - Let me check…

In the meantime, could you please confirm is there tag after this line…I dont see ending tag like the second one

<td>38, AVENUE JOHN F. KENNEDY<br />L-1855 LUXEMBOURG

Hiya,
Yes, there is ending tag. I didn’t notice the forum post format in my original post.
This is the first one:

<tr>
	   				<td class="labelStyle">Address</td> 
	   				<td>38, AVENUE JOHN F. KENNEDY<br />L-1855  LUXEMBOURG
</td>
	   			</tr>
	   		
	   		 
			
				<tr>

This is the 2nd one:

	</tr>
			 
			 
			 
			
				<tr>
	   				<td class="labelStyle">Address</td> 
	   				<td>3RD FLOOR, GORDON HOUSE, BARROW STREET, DUBLIN 4
</td>
	   			</tr>
	   		
	   		 
			
				<tr>
	   				<td class="labelStyle">Consultation Number</td> 
	   				<td></td>

This is the third example:

				<tr>
	   				<td class="labelStyle">Address</td> 
	   				<td>---
</td>
	   			</tr>
	   		
	   		 
			
				<tr>
	   				<td class="la

Even in some cases, only - - - is found, I would like it to be extracted too.

Thanks.

@MiaS - Since it’s very complex to ignore the <br /> in the middle. What I did was, extracted text between td tags and simply replaced the output with spaces for <br />.

Here is the updated xaml and the outputs: Regex_Mia.zip (35.5 KB)

1 Like

That makes sense. Thanks very much for your help. :slight_smile:

Hi again,
The solution you provided works very well when I tested one by one.

BUT when I put this regex in the loop, this only works for the address example 1.
No matter what sequence I put this 3 cases, it keeps jumping this error for the address example 2 & address example 3:

" Object reference not set to an instance of an object)"


Could you please help? Thanks very much!

@MiaS … it means Regex output is empty.

so it tells me that something got changed in your html body.

Is it possible to share the xaml and html(s) if you don’t have any PII data?

1 Like

@MiaS - Are you all set?? let me know…

1 Like

Hi @prasath17
It’s all set for this task. Your solution is totally fine. I made a mistake, I didn’t realise the match regular expression was changed too in your 2nd solution. I only changed the group function (from first solution you provided) into replace function(the 2nd solution you provided), that’s why the output is empty.
Thanks so much for your help. :slight_smile:

1 Like

@MiaS - Oh yes, I was trying hard to put an OR condition to ignore the br tag but it was not working so I changed my Initial solution. Sorry I thought you will notice since I gave the xaml.

Anyway, Happy ending…Cheers…Happy Automation…

1 Like

@MiaS - Please mark my post as solution , that will close this thread.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.