I’m working on a project involving document understanding, I have documents which can contain around 5 or 10+ pages but all of them belong to one document, the classifier divides one document into 2 or 3 based on I don’t know what and then extracts and this all messes up because if the document is divided some data is separated which will cause a problem, I haven’t combined documents ever, I’m using Intelligent keyword classifier and have trained it with more than 200+ document. Please let me know if we can somehow configure for the classifier to always classify one document as only one and not separate them.
Hi
This may happens because the document has same layout between pages and DU getting as multiple pages of the same document type. For exame, a PDF that contain 10 invoices.
You need improve your classifier, but if pages has the same layout its a little difficult. You have option to use Keyword Based Classifier
Or you can check in Classification Rules Check if the classified document has the same quantity of pages of the original ones:
in_ClassificationResults(0).DocumentBounds.PageCount, StartPage etc. Based the result , even classifier gets divide classifications, you send to manual action to human fix it before continue to Extraction Flow
In the new update of document understanding we will have option to disable splitting in classifier, see here:
Thanks for the reply Henrique, I think the new update should fix it, and yes I am sending the classifications to Action center as per rule you mentioned, but I just wanted to reduce this extra validation as the client usually gets annoyed by extra work, but anyways, I’ll look forward to this update and fix my issue.
This update will turn thing more easy. We have the same problem mentioned by you… Overload human validation in classifier
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.