Regex advice

Hi all

I have a problem I’m trying to solve and wondered if anyone could help. What I’m trying to do is read through a PDF document, look for a key word, and extract the whole sentence containing that word, so for example the PDF will look like this:

56.8. Liaise with the incumbent Service Provider to enable the full completion of the mobilisation period;
56.9. Produce and implement a communications plan , to be agreed with the Client, including the frequency, responsibility for and nature of communication with the Client and end users of the service;
56.10. Produce a mobilisation report for each Affected Property to encompass programmes that will fulfil all the Client’s obligations to landlords and other tenants. The format of reports and programmes shall be in accordance with the Client’s requirements. Particular attention shall be paid to establishing the operating requirements of the occupiers in drawing up these programmes for agreement with the Client;

If I was looking for the word “Communication”, I’d want to extract this part - 56.9. Produce and implement a communications plan , to be agreed with the Client, including the frequency, responsibility for and nature of communication with the Client and end users of the service;

At the moment, my process involves searching the PDF page by page and splitting the text by Environment.NewLine ToArray, then looking through each array for “Communication”. As it’s split by NewLine, it isn’t picking up the full sentence.

I wondered if you knew how to split the text between from number to number, e.g. the text from 56.8 to 56.9 (including the number at the start), the text from 56.9 to 56.10 etc. The only issue is that the numbers will change format, i.e. examples of numbers on the PDF are:
1.2.
1.2.3.
34.2.
34.3.16.
117.14.3.
216.2.13.3.

Any help would be gratefully appreciated :slight_smile:

Hi,

I wondered if you knew how to split the text between from number to number, e.g. the text from 56.8 to 56.9 (including the number at the start), the text from 56.9 to 56.10 etc.

How about the following expression?

arrString = System.Text.RegularExpressions.Regex.Matches(yourString,"(\d+\.)+[\s\S]+?(?=(\d+\.)+|$)").Cast(Of System.Text.RegularExpressions.Match).Select(Function(m) m.Value).ToArray

note: arrString is string array

Regards,

1 Like

You could use Regex.Split() with the following pattern (?=^\d+\.)
You need the multiline option for this to work.

You absolute genius, thank you!

Do you know why it doesn’t work on regex101? regex101: build, test, and debug regex I’ve tested it on UiPath and it works perfectly but just wondered why it’s not working on the above?

Hi,

We need to set .NET(C#) at Flavor when we use regex101 for UiPath.
Also need to set global option, in this case.

Regards,

1 Like

You’re amazing, thank you so much!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.