Extracting text between two Str Delimiters

It will ALWAYS be after the third instance of “Email”. So can i somehow put third instance of “Email” into the StingStart = “Email” (3rd occurrence") and then I will try to use (" ") for String End.

1 Like

I’ve Realised (" ") wont work. Also I must mention after the 3rd occurrence of “Email” there is not always an email address. Just the continuing text.

1 Like

So… StartString = 3rd occurrence of “Email”, does the next word include “@”? if so take that email address, if not will just put “No Data” in that excel box.

Hope this can be achieved

1 Like

Hi @MarkC1500 - the regex that should get every non-whitespace character that surrounds “@” sign is: \S+@\S+
Please check is it works for you.

1 Like

You can test regex patterns e.g. here: https://regex101.com/

1 Like

@MarkC1500,
Please check the workflow below - I have applied both regex patterns for you, and both names and addresses are later entered to an excel spreadsheet in relevant columns B and C. Does this solve your problem? :slight_smile:
string in between.xaml (16.7 KB)

1 Like

Thank you @PAD. I haven’t used regex yet. Need to learn these things. Could your share the code for me that I would use. I need this to get the text surrounding @ after the 3rd occurrence of “email” ?!?

I also think I will need a Boolean for “does @ occur within the 30 characters after 3rd occurrence of “email” so I can then just set the data in excel as “no data” instead

1 Like

Sorry. Didn’t see your third post.

1 Like

Here is where you would apply this regex:


Let me know if this worked for you :slight_smile:

1 Like

@PAD Thank you. I am going to leave the write each section to excel for now as files will be deleted after extracting names (files with no data), so when I get to the email section the file references will be different. Not important that one for now anyway.

The regex code will work just fine, once I have extracted the text (lets say 30 characters worth of text, as this will be safe for my data) after the 3rd occurrence of the work “email”. Then i can apply regex code to that section. Please advise how i do this bit then i think that will be good to go.

1 Like

Please see these threads - when you arrange your items in a list of addresses, then you can use an invoked method to remove the unneeded elements:


1 Like

A lot of indexing being used. My indexes will be different per pdf file.

I need to somehow just find the 3rd occurrence of “email” then split the string.

1 Like

You could also append your string which contains the whole text by re-assigning it as e.g. newString = System.Text.RegularExpressions.Regex.Replace(value, “^.?\S+@\S+.?\S+@\S+.?\S+@\S+.?”, “”) - where “value” is that old full string variable (just like I named it in my workflow). This should cut off everything from its start till the end of the third email address. Later you would process in your further entry this newString variable.

Hi @PAD

Sorry if I’ve made it confusing. It’s not actually the third email address I’m after out of my data. It’s the email address after the 3rd occurrence of the word “email”. I have done this by splitting the string with “email” and then utilising index 2, then applying your regex. Works like a charm.

I do have more data to extract, so maybe your last suggestion will come in handy still.

Learnt a lot, thanks for your help

Mark

Hi @PAD

Moving onto the next issue. I am extracting phone number. Similar to extracting the email in first step. But second step I don’t have an “@” to play with.

I have split the string after third occurrence of the word “Mobile”. So my outputs either start with phone number “05650456654 random text…” or “random text…” as there is no number present.

First thing i need to do is check if there is a number present at index 6 of the string then return a Boolean. Please advise

Second step will be to extract the phone number, I can do this by splitting the string again with " " delimiter then using index(0), this will be fine.

Thanks

Its ok, I have done it. Just a lot of string splitting. The joys of unstructured data/text!!

a simpler approach would be to use Abby Flexi Engine and Studio to create templates. Then you won’t need to do string manipulation and directly get the field values in excel under your specified column names :slight_smile:

Don’t tell me this now :slight_smile:

Will look into that.

I’m learning some stuff which is most important. And i will be glad when its done!

its a time vs cost scenario. If your bussiness needs the automation fast, better approach is Abby. If you have the leisure of time… then ofcourse Regex :slight_smile:

Apart from the matter of time, what might affect our choice of a solution is the fact that Abbyy FlexiCapture requires a separate license (to my best knowledge) :grinning: