Hi all
I have a problem I’m trying to solve and wondered if anyone could help. What I’m trying to do is read through a PDF document, look for a key word, and extract the whole sentence containing that word, so for example the PDF will look like this:
56.8. Liaise with the incumbent Service Provider to enable the full completion of the mobilisation period;
56.9. Produce and implement a communications plan , to be agreed with the Client, including the frequency, responsibility for and nature of communication with the Client and end users of the service;
56.10. Produce a mobilisation report for each Affected Property to encompass programmes that will fulfil all the Client’s obligations to landlords and other tenants. The format of reports and programmes shall be in accordance with the Client’s requirements. Particular attention shall be paid to establishing the operating requirements of the occupiers in drawing up these programmes for agreement with the Client;
If I was looking for the word “Communication”, I’d want to extract this part - 56.9. Produce and implement a communications plan , to be agreed with the Client, including the frequency, responsibility for and nature of communication with the Client and end users of the service;
At the moment, my process involves searching the PDF page by page and splitting the text by Environment.NewLine ToArray, then looking through each array for “Communication”. As it’s split by NewLine, it isn’t picking up the full sentence.
I wondered if you knew how to split the text between from number to number, e.g. the text from 56.8 to 56.9 (including the number at the start), the text from 56.9 to 56.10 etc. The only issue is that the numbers will change format, i.e. examples of numbers on the PDF are:
1.2.
1.2.3.
34.2.
34.3.16.
117.14.3.
216.2.13.3.
Any help would be gratefully appreciated