Regex expression to match a keyword within a string, delimited by spaces, comas and semicolons

Hi all!

So the scenario is the following one:

I have an Excel file with several keywords.
I have to find all of these keywords within a lot of CV’s. Each one of the CV’s is completely different.
I simply have to detect that the keyword is within the CV.

What I did:
I used the PDF Read activity and the whole text is put into the string “PDFText”.
Then I use a for each activity, that will read each one of the keywords of the Excel file, and within the for each activity I use PDFText.Contains(keyword), to see if the keyword is there.

This seems fairly simple. But imagine that the keyword is “ERP”. And imagine that the “PDFText” contains the word “PowerPoint”. This would mean that PDFText.Contains(keyword) would give TRUE. And this is not what I want. I want to detect the word “ERP” as a separate word, thus not being a substring of a given word.

So I think that the best solution would be do a match of the string “PDFText” with a Regex expression, with a given pattern.
The pattern would consist of the “keyword”, and before and after the “keyword” there must be either a space, a coma or a semicolon.
Furthermore, if the keyword is in the beginning of the string (or in the beginning of the line?), there will be no space, coma or semicolon before the “keyword”, so this should also be reflected on the Regex pattern. The same if the keyword is in the end of the string (in the end of each line of the string?)

Do you think this is the best way to detect a given keyword as being an independent word and not being a substring of a given string?

Could someone please indicate what the exact pattern expression would be?

Thanks in advance!

Hi again,

I don’t want to scare you with the large text I wrote in my previous message :slight_smile:

I just want to find the word “keyword” within a string with a regex pattern. The pattern would consist of the “keyword”, and before and after the “keyword” there must be either a space, a coma or a semicolon.

Thanks :slight_smile:

Hi @jcab,

Im not pro here. But i have some idea. If you know keyword you can try in for each put switch… which will be string and then assign value to variable. But you need to cut this string. Or if you know all needed Keywords then you can do arrayString{“ERP”,“PowerPoint”} and check if they are or not in a loop.

Regards
@fudi5

Hi @jcab,

Can you provide sample input string and you expected output also.so we can understand ur requirement clearly.

Regards,
Arivu

Hey @jcab,

If you’re wanting to use regex, you can use something like

"\b(?i)" & keyword &" \b"

for your match pattern.

\b - The match must occur on a boundary between a \w (alphanumeric) and a \W (non-alphanumeric) character. This works at start/end of line too.
(?i) - Use case-insensitive matching.

I would call my regex skills mediocre at best so I always use these two resources. I find online regex tester a very useful tool when trying to work out the right pattern.

Regex Tester
Regular Expression Language - Quick Reference

2 Likes

Thanks.

I’m currently using the following regex pattern:
“\b”+keyword+“\b”

But for some reason I’m unable to detect the keyword = C#, meaning that if the pattern is “\bC#\b” I’m unable to detect C# as being a separate whole word.

@tmays - I have a Table of rows(String) which contains date as “EndDate”, this EndDate is sometimes mentioned as “enddate” and the date format also differs from 08/21/2018 to 21 August 2018. How can I extract the date alone. Any idea?