In my process I need to get specific text surrounding a keyword. See example below, the keyword is ‘process’ and I need to get the text from the first number up to the full stop after the word services. I have managed to get the text between the full stop after the number 4 and the full stop after the word services but i also need the numbesr before the number 4. Any help very much appreciated, thanks.
107.2.4. Be required to manage and agree all consents of Others as part of this process (for example landlords) before commencing works or services. All permits to work shall be supported by full risk assessments and method statements for undertaking the work.
I’m assuming that the text you’ve managed to get the text between the full stop after the number 4 and the full stop after the word services was by splitting the initial text by ". "
You could amend the if item.Substring(0,1).IsNumeric in order to fit all the needs. In this case the check is to see if the first character is numeric, assuming that a normal phrase will not start with a digit…
The regex will have to be updated based on your actual requirement though. You can try it out in the website .NET Regex Tester - Regex Storm. It requires multiple examples to make the regex more stable…
Step 1: Import all libraries.
Step 2: Convert PDF file to text format and read data.
Step 3: Use “. findall()” function of regular expressions to extract keywords.
The first step of the process is to locate a key word, in the previous example it is ‘process’.
Next I need to get the full sentence that the key word is part of, in the previous example it is ‘Be required to manage and agree all consents of Others as part of this process (for example landlords) before commencing works or services’.
Then I need to include the paragraph number, in the previous example it is ‘107.2.4.’
So out of the paragraph…
‘107.2.4. Be required to manage and agree all consents of Others as part of this process (for example landlords) before commencing works or services. All permits to work shall be supported by full risk assessments and method statements for undertaking the work.’
As the key word is process, I need to pull out the part that says…
‘107.2.4. Be required to manage and agree all consents of Others as part of this PROCESS (for example landlords) before commencing works or services.’
In addition, the numbers may not always be in the above format, below is another example
‘143.4. The services delivered outside of the agreed cleaning operational hours shall be managed via the Service Order process on instruction by the Service Manager.’