Regex help

Im extracting content from an email and i want to get the position of the sender e.g. advanced user, rookie, orchestrator etc.

And the pattern is the name of the person that is above the position.

I’ve tried alot to find a way to get the position but i always miss something.

Any ideas? Thank you.

From is not a problem, you can search for “From:” & extract the sender name using string functions.

The position is a tricky one, may be once you find the from user name, search again & assume the next line is position. But not sure how many times it will work.

@Ibra Can you share whats input and whats the expected output with example

The question is: when the name is always above the position of the sender and the senders name is repeated in the first line of the email content like the screenshot above “From: Ibra Uipath” -> how is it possible to tell the program to ignore the first name and focus on the second in line?

Just a little thought, idk…

The input is an email content and the output is only the position “Ibra rokiee” (the only thing i want to retrieve here)

I tried using this syntax:

(?<=Ibra Uipath)\s+.+

But its getting 2 values:

  1. (newline) Test Email 1`
  2. (newline) UiPath Rokiee

How can i remove the newlines and ignore the first value?

@Ibra

The name will not remain always constant right. So instead of using name use Best Regards

Regards,
Mahesh

In uipath studio i will use .Mail.Sender.DisplayName instead for an static value like the example i posted.

@Ibra

what if they use their Short names in Best Regards Section.

Regards,
Mahesh

That will be ignored as long as i can get their name from the .mail.displayName.

@Ibra

Oh Ok.

Regards,
Mahesh

Anyone?

Did you try and add the newline within the positive lookahead? For example:

(?<=Ibra Uipath[\n\r]).*(?=[\n\r])

It gave me an empty iterator. So i modified your syntax with this:

“(?<=Ibra Uipath[\n\r][\n\r][\n\r][\n\r]).*(?=[\n\r])”

And it worked.

But why does UiPath require MORE spaces (\n\r) than regex101.com?

Weird, but good to hear that it worked. Can you try and change it to this:

"(?<=Ibra Uipath[\n\r]+).*(?=[\n\r])”

That should work for any amount of subsequent newlines, 1 or more. I think that’ll be safer.

It took the first value “Test Email 1” instead of "UiPath rokkiee"

Ahhh, of course. This should be the one:

"(?<=[\n\r]Ibra Uipath[\n\r]+).*(?=[\n\r])”

Is it supposed to give the value with the index of 3?

No, that is because it matches every new line… Instead of .* between the brackets, try \S.* or \w.*

Ok i just tried with \S* and \w.* but it returned null.

So i modified your syntax with this input:

“(?<=Ibra UiPath[\n\r]\s+)\s\S.*”

And it returned only the UiPath Rokiee

I dont know if this syntax i just used is too “simple” and can be unreliable. But it worked.

EDIT: I attach the .txt file as a zip.123.zip (226 Bytes)