Get text not recognizing a new line

Hello,
I am trying to extract this address using a get text activity .
This is what the get text extracts : “721 Green Street AtlanticCity NJ 08401”
but this is what it should have been : “721 Green Street Atlantic City NJ 08401”

there should have been a space btwn the Atlantic and the city . removing the space really affects the address lookup. how do i solve this as i am out of ideas

image

@MasterOfLogic

Can you try with Get Full Text activity once.

1 Like

@MasterOfLogic

Another option would be to check if there is any property within the UiExplorer, that contains the full address, with the space included, and if that’s the case, you can use a “Get Attribute” activity in order to get the address.

1 Like

it looks like it grabs from a web page.

when using get attribute and outerhtml attribute we have a chance to get info on the splits

1 Like

Hello @ppr Thanks so much for this , this worked i can now see <'break> which indicates a break and from here i can do a simple string manipulation to replace the
with space , also appreciated @ignasi.peiris your suggestion was also close as it enabled me explore more attributes within the get attribute activity.

however I am faced with another issue , This time the innerhtml can be an actual html link or just a html text . i.e if the address was a hyperlink it would be

@“<td uipath_custom_id=”“23"”>
<a href=““Google Maps"” target=”“_blank”“>721 Green Street<'br>Atlantic City NJ 08401

and if the address was normal html it would be “721 Green Street<'br>Atlantic City NJ 08401” .

how do i extract “721 Green Street<'br>Atlantic City NJ 08401” perfectly in both ways?

1 Like

Hi @MasterOfLogic ,

If the Format of the Data is always going to be the same, then check with the Below Regex Expression :

<td.*\n<a.*?(?<=>)

We could use a Regex Replace, to remove the tags only and keep the Data, Below is the Expression :

System.Text.RegularExpressions.Regex.Replace("YourExtractedData","<td.*\n<a.*?(?<=>)","").ToString.Trim

for a general approach we can following strategy:

replace <br> with a space
replace every tag (opening, closing) with an empty string

for latter we can use regex

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.