Bot needs to find target word in a string, even if the letters of the target word have been seperated by a hyphen, new line break or a digit.
The target word could get different characters between each letter after it’s extracted from a tabular pdf. Here are 2 examples of what the extracted string could be:
“This is a Fai - airies example.”
“This is a Fa- 8 -
iries example” (new line)
Bot needs to match Fairies for both examples.
The word after and before “Faries” is dynamic. It won’t always be “example” so I cant use example as a keyword. And won’t know beforehand where the new line will be.
Thanks for the suggestion. I should have clarified that the word after and before “Faries” is dynamic. It won’t always be “example” so I cant use example as a keyword.
lets preview to another approach as well, simulating more on how a human would handle
The idea:
the word of interest is known
we start with the first char and ommit anything that is not a letter
then the n+1 char is checked if it is same as the n+1 char in the searched word.
yes: continue with n+2…
no: abort
A strategy like take first char and last char, extracting a part result spanning up this within the text will work for your word but not for a word like “houses” as the ending char also occurs within the token.