Matches Activity Works but Regex Based Extractor with Same Expression Not Working

I am utilizing document understanding. The Digitization and Classification are working as expected. I am attempting to use a RegEx based extractor to get the 4 fields I need. When I run the process, the extractor ends up empty for all of the fields, even though if I copy the Document Text being fed into the Data Extraction Scope and paste it into the “Test Text” section of the RegEx Builder, it works as expected. I even added a Matches activity with the exact expression right before the Extraction Scope and passed in the Document Text variable and it works just fine there, so it seems to be an issue with the Document Understanding Data Extraction Scope. Any ideas what could be causing this? Does the Regex Extractor act on the Document Text variable? If so, I can’t make sense of how it isn’t working.


Hi @LaHood_AM ,

Could you enclose .*? in brackets and Check ? Like below :

(.*?)

Could it be the case that the extracted value is within a line break?
as . is not including \n we can rewrite to

(?<=Patient:\s)[\s\S]*?(?=\sAddress)

Thanks for the quick reply! This helped for all except for one that I have that goes across multiple lines. The text from the document looks something like:

Refill Request
1234 Address St.
City, ST 54455
Tel: 555-555-5555

What I currently have is (?<=Request\s\n*)(.*?)(?=\sTel:)

I am trying to extract the middle two lines. I also have the “Multiline” Regex Option selected. Any suggestion on this one?

as mentioned:

also keep in mind the windows linebreaks composed by \r\n which we express defensive by \r?\n

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.