Regex Based Extractor - Improvement Idea

I have been using Regex Based Extractor activity as alternative to Form Extractor,I found some awesome tools like use Explicit Capture to determine groups of extraction


However also I found that the Document Text that comes from the Digitize Document activity contains assorted words (i.e. This text not always bring the text readed from top to bottom and left to right) that increase the complexity of our projects at time to extract text based on anchors or delimiter text.

Idea: Use an property of the Document Object Model Variable called GetVisualTextProjection and use its property ProjectedText to use as input text for Regex Based Extractor to allows us try to extract the text from a projection of the text with sorted text.

Hi @Ioana_Gligan

I would appreciate if the Document Understanding team can consider this improvement idea for Regex Extraction activity. It can be an optional feature for comparison.



Hello @AndresTarazona,

This is already in - if you check the flag on the activity, the use visual alignment flag. At run time, it will use the top to bottom, left to right alignment of the text before applying the regex :slight_smile:


Thank you!!!