Shorten Regex Pattern

Hello,

Bot needs to find target word in a string, even if the letters of the target word have been seperated by a hyphen, new line break or a digit.

The target word could get different characters between each letter after it’s extracted from a tabular pdf. Here are 2 examples of what the extracted string could be:

  1. “This is a Fai - airies example.”
  2. “This is a Fa- 8 -
    iries example” (new line)

Bot needs to match Fairies for both examples.
The word after and before “Faries” is dynamic. It won’t always be “example” so I cant use example as a keyword. And won’t know beforehand where the new line will be.

This regex pattern works well:

F\s*\S*\sa\s\S*\si\s\S*\sr\s\S*\si\s\S*\se\s\S*\s*s

I’m wondering if there is a more efficient/shorter way to write the pattern? Because some target words consist of a 5 word phrase.

Attached screenshot of pattern cos its being edited in my post

Thanks

Hey @6027ae06be5a67a04d29acc18,

User Fa.+(?=example)|Fa.+\n.+(?=example)

Thanks,
Sanjit

Hey,

Thanks for the suggestion. I should have clarified that the word after and before “Faries” is dynamic. It won’t always be “example” so I cant use example as a keyword.

String could also be:
“Another Fai-ries story”

Can you check:
grafik
\bF(([aire]+[\s\S]*?)*?)s\b

2 Likes

Hello @6027ae06be5a67a04d29acc18
Kindly refer this regex pattern

F.*(?<=s\s)|F.*\n\S+
F.*(?<=ies\s)|F.*\n\S+


image

Thank you, this works!

Thanks, this works!

@6027ae06be5a67a04d29acc18

Happy Automation :smiley:

Regards,
Gokul Jai

1 Like

This will generate Peter’s regex pattern for any string as input, in case the word you are searching for is a variable:

"\b" + WordToSearch.First + "(([" + String.Join("", (WordToSearch.Substring(1, WordToSearch.Length-2)).Distinct()) + "]+[\s\S]*?)*?)" + WordToSearch.Last + "\b"

1 Like

lets preview to another approach as well, simulating more on how a human would handle

The idea:

  • the word of interest is known
  • we start with the first char and ommit anything that is not a letter
  • then the n+1 char is checked if it is same as the n+1 char in the searched word.
  • yes: continue with n+2…
  • no: abort

A strategy like take first char and last char, extracting a part result spanning up this within the text will work for your word but not for a word like “houses” as the ending char also occurs within the token.

I see, thanks

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.