Kindly help with this: Data Extraction Scope - Failed to consume License
Hey @Rachita_Chauhan! Sorry, but I can’t help on that issue… You should probably raise a support ticket for it.
PS: Sorry Tomlin, I missed your message; only saw it now
Hi @Alexandru-Luca ,
I’m finally back with focus on using the DU Framework template. First off, it is very easy to use. And thanks for creating this for us!
This may not something directly related to the template, rather the DU product itself. I’m trying to come up with a configurable rule that I could use to decide if a document has to be sent of to a Data Validation HITL Action in the Action Center.
I am not sure if this already exists? If not, I was wondering if the ExtractionResults object by itself could summarize a base statistic of the overall confidence it has when extracting results from a document.
Here is one example in which I have 10 fields. Of these, 8 of them have been pulled with 99-100% confidence whilst 2 of them are distributed between 66% and 8%!
Therefore, the overall confidence is definitely down in this case. But I wouldn’t know it unless I look at the Confidence or OcrConfidence attribute of the Fields collection of the ExtractionResults object. The second screen shot shows this from the Debug window in Studio.
I just want to make clear that this confidence is a technical measure and must not be misunderstood for being a business measure of confidence. In this example, the field that has been extracted with 8% confidence may not be very important from business point of view, but it sure is a factor to decide if this document needs to be sent off to HITL.
So what is the ask?
Can these individual confidence measures be summarized by the DU product somehow into a single measurable quantity that could be used in the decision making process to spin a HITL action?
If there is an easier way to do this, please do let us know.
Simply put: no. And you already guessed the reason - the confidence output is a technical result; an algorithm’s (classifier/extractor) computation of how well it performed.
Users are much too often tempted to interpret this simple numerical output as a business measure of confidence, like you mentioned. This should never be the case. The decision of whether documents should go through HitL is, after all, a business rule decision that should be properly discussed & agreed upon with the business SMEs.
And which kind is something I am trying to get my head around. Different documents are recognized by the machine with accuracy, but with different levels of confidence. Example, a field with 88% confidence still shows accurate extraction results 100% of the time.
I then started looking at individual fields to see if I can arrive at a Business rule.
But what if the document has got 10s of fields? And what if we have dozens of taxonomies?
If the mathematical interpretation of confidence isn’t to be considered, then how do we maintain complicated rules to decide which document must be routed to HITL ?
Writing complex rules is not the problem. Making them maintainable & adaptable seems to be.
Thanks for your response!
Hi @AndyMenon !
As devs, we do the testing to see what confidence levels translate into good extraction accuracy. But beyond that, it’s the business SMEs decision - for every document type, for every field to be extracted, for every table column or even cell to be extracted. It’s not only a matter of confidence - it’s about which fields are optional and which should always be found; it’s about cross-referencing and checking the validity of extracted data.
Simple example, for invoices:
- Is the vendor name a match in a pre-existing list of approved vendors?
- Does the quantity * item-price = line-amount for every item?
- Do the line-amounts sum up to the invoice total?
We don’t decide the rules, because we don’t know them, the business does. We “translate” the business rules into code and implement them.
Agree! No doubt about it. I just did something similar to a process that I am currently working on.
Just trying to see how to bring the rules + taxonomy together in a more manageable way so that SMEs can manage them without the DU flow being impacted.
Hi there Alex,
Any news on the Queue support for the LRWFs ?
We’re working on it in order to have the solution officially released ASAP.
Thanks for the answer @Alex
Awesome work with the Framework so far, I can’t imagine building DU processes without it !
thank you:) this was a quick and easy process to follow.
you’re welcome :))
Thank you for this post, It help a lot for the beginners like me.
I clicked on the template, however I get the following error:
The package ‘DocumentUnderstandingFrameworkTemplate 1.0.2’ does not contain any project template.
Nevermind. Figured it out.
It’s been a while, but the solution is now official and available in a public preview release:
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.