In order to paint the full picture of a Document Type, together with information which is not necessarily available in the document itself, we have added the ability to provide additional information about it. In this effort, I’m happy to report, we started enriching the Document Type with Business Rules, on a Field Level, staring with the Intelligent OCR package version 6.6.0-preview.
These rules need to be respected by the extracted field values, in order for the extraction to be considered successful (e.g. Invoice Number must be of a given form, mandatory fields, Vendor Name can be either “Uber”, “Lyft” or “Bolt”, etc.).
Rule Definition
In order to define the rules, one would go in the Taxonomy Manager, select the corresponding Document Type & Field, and navigate to the “Business Rules” tab:
One will be able to add multiple rules and define one rule operator type:
-
AND - indicating that all rules need to be fulfilled - if any of the rules is broken, an exception is raised
- e.g. Invoice Number starts with A AND ends with X
- valid field value: A123X; invalid field value: A123
-
OR - either of the rules need to be fulfilled - if all rules are broken, an exception is raised
- e.g. when working with invoices and qualifying these based on Vendors, the Vendor must start with A OR Vendor is BCA, CCC or DEB.
- valid field value: ABC, AAA, CCC; invalid field value: XYZ
The following rule types will be available:
- Is not empty: the extracted value is not empty (may represent a mandatory field) - and if missing, it requires validation/manual input
- Possible Values: the extracted data is one of the certain possible values, defined by the user (e.g. Employer Type is either “full-time”, “part-time” or “internship”)
- Fixed Format: the extracted data is in a fixed format - starts with, ends with, has X characters, is numeric, is date, is email; in other words, it matches a RegEx (e.g. Invoice Number starts with INV-, Postal Code has 6 characters ) - applicable to text fields only
One is also able to define a criticality level for all rules of the field
- SHOULD (default)
- MUST
Rule Execution
The rules will be executed after the data from the Documents has been extracted. Their result, is available on the Extraction Result as a collection of FieldRuleException, which can be evaluated in the workflow, via the following methods:
- ExtractionResult#hasExceptions
- ExtractionResult#hasExceptionsOfType[“X“], where X = field type
- ExtractionResult#hasExceptionsOfCriticality[“X“], where X = criticality type (SHOULD, MUST)
Rule Validation🕵️
The execution of the rules will also be displayed in the Validation Station (both in Studio and in Action Center), so that Users can easily see and navigate through them.
As a result, the fields grouped into the following categories:
- fields with broken rules - and after fixing the extracted value and respecting the defined rule, these will be moved to the second group.
- fields without broken rules (as last group)
Note that, one is able to submit and save a document with only SHOULD rules broken - however, rules of criticality MUST need to be resolved.
In this first version of the feature, the rules will be re-evaluated when the user attempts to submit a Validation Station session - however, in the future, we plan on improving the experience.
Future
The 3 rules are just a start - we plan on adding many more (for example for mathematical operations or connections to third parties) - in this sense, please do give them a try & provide us your feedback! How is their configuration so far? What other rules would you like to see?