Regex code to find string seperated by a line break

Hello,

Can someone please help me figure out a regex code for Bot to search a text file and find a string of specific words even when they are separated by a line break?

I have extracted the text from a pdf file to text file and preserved the format. For example as per screenshot, if Bot searches for “provincial and international borders” it’s not finding a match because of the line break after “and”. I’ve preserved the pdf format because bot shouldnt search the French text in right column.

Input: pls see screenshot
Expected output: “provincial and international borders”

Bot is required to search through the text to determine if it contains a pre-defined string stored in a dynamic variable (usually 3-4 words phrase), which could be found across several pages in the file.

Thanks

Hi,

For now can you try the following pattern?

\bprovincial\s+and\s+international\s+borders\b

Regards,

Hi,

If possible, can you share the original PDF file?

Regards,

Hi,

Can you try the following sample?

image

First, read Pdf file as text using Read PDF Text with PreserveForamtting option.
Then, remove French part. French part always starts from 74th character in each line.
Next, remove line number: 5,10,15,20,25,30 using Regex.Replace
Finally extract target phrase using the above regex.

Sample20220820-1.zip (35.7 KB)

Regards,

1 Like

Thank you so much, that works!

Hi,

If you also need to get page number, it might be better to split each page using ExtractPdfpageRange activity or Regex etc, in advance.

Then, try Matches activity or regex.matches method to extract multiple matches.

Regards,

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.