Extract email address with vba formula

Hello,
An email address is contained between angle brackets email@host.domain, so that I can extract it based on such format. But is there a way to be able to get email address when the string comes with no brackets at all?
I used to have .split and .regex functions. The first one isn’t that robust (always looks for <>), the latter I might be using wrong but at least I can associate it with a keyword ‘From:’.
Now, can anyone suggest please some string vba expression to provide as value to my variable?

You can make your Regex more precise right?

Use an expression to match

  1. the special character @
  2. Should contain ‘.’
  3. can contain any character before and after ‘a’ till a space is encountered to separate it from other words
  4. Look for characters like <> or “” that might contain the email enclosed inside it
1 Like

@songoel have you got any idea if Systemtext.Regex(…) will give me an exact match?

Regex will be a perfect solution.
Use System.Regex.RegularExpression.Match(stringinput, regex)
Define your regex as:
[^<|].@...*[^>]
This will match any email:
1.may or may not be enclosed between <>
2.contains @ and .

1 Like

Use this RFC
(?:[a-z0-9!#$%&'*+/=?^_{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

Reference : RegexValidation
Source 2: RailRoad Diagram

1 Like

@Lakshay_Verma thank you for your suggestion, wondering why I can’t use .Split to get text in between “FROM:EMAIL@HOST.DOMAINSUBJECT:” ?
I try as I might different combinations of .split but none of them gives me just an email, do you have any idea?

1 Like

@ykuzin

You want to use split for FROM:EMAIL@HOST.DOMAIN ?

Then you can try as “FROM:EMAIL@HOST.DOMAIN”.split(“:”)(1)

Hope this helps you

Thanks

try this

“FROM:EMAIL@HOST.DOMAINSUBJECT:”.split(":"c)(1)

1 Like

@Lakshay_Verma thanks, i tried it and subject remains appended to domain name, I can get rid of it by .replace(“SUBJECT”). Wondering, is there any other way around?

There is a way out, if your domain is specific like .com/.org
Is that a case ?

It does deal with mentioned domains, yes
Except it would be better to have rather robust mechanism to extract client’s emails of a different format, .replace helps cut an unwanted part, considering recent tests it’s ok, thanks

1 Like

Removing a partial part is little tricky, only way to do is using ML otherwise best to move with replace as it fit your purpose.

1 Like