Classification Results dividing one document into multiple documents based on Pages

Mohammed_Anas · February 7, 2023, 2:47pm

I’m working on a project involving document understanding, I have documents which can contain around 5 or 10+ pages but all of them belong to one document, the classifier divides one document into 2 or 3 based on I don’t know what and then extracts and this all messes up because if the document is divided some data is separated which will cause a problem, I haven’t combined documents ever, I’m using Intelligent keyword classifier and have trained it with more than 200+ document. Please let me know if we can somehow configure for the classifier to always classify one document as only one and not separate them.

rikulsilva · February 7, 2023, 4:28pm

Hi

This may happens because the document has same layout between pages and DU getting as multiple pages of the same document type. For exame, a PDF that contain 10 invoices.

You need improve your classifier, but if pages has the same layout its a little difficult. You have option to use Keyword Based Classifier

Or you can check in Classification Rules Check if the classified document has the same quantity of pages of the original ones:

in_ClassificationResults(0).DocumentBounds.PageCount, StartPage etc. Based the result , even classifier gets divide classifications, you send to manual action to human fix it before continue to Extraction Flow

In the new update of document understanding we will have option to disable splitting in classifier, see here:

Mohammed_Anas · February 8, 2023, 11:03am

Thanks for the reply Henrique, I think the new update should fix it, and yes I am sending the classifications to Action center as per rule you mentioned, but I just wanted to reduce this extra validation as the client usually gets annoyed by extra work, but anyways, I’ll look forward to this update and fix my issue.

rikulsilva · February 8, 2023, 11:41am

@Mohammed_Anas

This update will turn thing more easy. We have the same problem mentioned by you… Overload human validation in classifier

system · February 11, 2023, 11:42am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problem with classification, Intelligent keyword classifier is splitting my pdf when there is more than 1 page Document Understanding activities , question , document_understanding	2	1131	August 12, 2022
Classification Results - Multiple documents in a file - not being classified Document Understanding	3	3027	August 11, 2020
Does the machine learning classifier not support page splitting? Document Understanding document_understanding	2	34	October 28, 2024
Is it possible to Split the Document at the level of Classifiers and Save them as Single PDFs Document Understanding	2	1241	July 25, 2022
Document Understanding : Saving unclassified document in a separate folder AI Center question , document_understanding , ai_center	1	1402	February 23, 2021

Most Active Users - Yesterday
Yoichi
ashokkarale
jast1631
Anil_G
adiijaiin
Eric_Alvarado
mukeshkala
Daniel_Rabin1
Avala_Parameswara_Reddy
natcmll
More details...

Classification Results dividing one document into multiple documents based on Pages

Related topics