In document understanding “regex based extractor”,
need to extract a name, i’ve used regex “[A-Z][a-zA-Z.]+\s[A-Z][a-zA-Z.]+”, but it matches multiple occurrences, i need to extract the first two occurrences.
Is there a way to limit the regex to a specific portion of text?
It seems that your data is extracted from an OCR. However, we would require to know if there is a set pattern that the data follows. Such as, is the Word OFEICE/OFFICE,SCHOOL always be present in the second line ?
In order for us to target the required values, we would need more info on the pattern of data. This can be done by analysing multiple data text/inputs that you would receive.
For now, Could you check with the below Expression (For Extracting only the First Occurrence of the words):
Regex :
(\s[A-Z][a-zA-Z. ]+)+
Although, the match is happening for multiple occurrences, we are splitting the text based on New Line, and we are capturing the First Line only and extracting the textual (continuous Alphabetical text) data.