Hi Team:
I Recently tested Document Understanding capabilities training a model with local invoices and results were so interesting.
The performance of the Document Unserstanding ML models is remarkable anyway in my experience there are situations when you also need to apply specific extraction rules. It is true that in Extraction Scope you can combine multiple extractor like ML Extractor and RegEx Extractor if necessary, but this last is not suitable when you need more than just a rule for pattern recognition but also keyword proximity and other more advanced combination of rules.
I propose two things:
- Custom Code Extractor: A custom code extractor (similar to Invoke Code activity) that allows you to generate your own Results Values that can be included in Extraction Scope and combined with existing extractors. This Custom Code Extractor could receive document text from previously executed Digitize document activity (and later extraction and manipulation steps), barcode recognition activities or other inputs.
- Extraction Scope – Enhancement to Configure Extractor Wizard: This wizard that allows you to set up which extractors you want to use for each field, in the case that you have more than one extractor configured the remaining value is the best of the available and enabled extractors. In some situations, in which you have a preference, it is also desired to choose the first of the enough acceptable alternative from extractors and locate your preferred extractor first.
I hope my feedback would be helpful to improve this great IDP solution and please feel free to reply if my idea was not expressed clearly.
Regards
Gabriel Marin