DU License consuming - limit for pages with extracted information

Camila_Caldas · September 22, 2022, 5:29pm

Im extracting information from a pdf file. In some cases the pdf has more than one page and the extracted information is not in all pages, but I dont have a pattern for this. I saw that in this cases the framework is processing all files and consuming DU license. Is it possible to indicate what pages I want to extract? Im using only machine learning extractor.

sharon.palawandram · September 22, 2022, 6:11pm

The Machine Learning Extractor consumes one unit/processed page, even if the extracted information is not to be found on that page.

You can use keyword based classifier to identify keywords of pages/pdfs you want to extract. You can filter the pdfs and send it across to the ML extractor and this might help you DU license consumption.

More details on how extractions are charges can be found below:

Camila_Caldas · September 22, 2022, 6:21pm

NIce @sharon.palawandram ! Im already using intelligent keyword classifier, but seems it is using for classify the document as a whole and generating an unique confidence level. How can I make it work in a page level? Is it possible?

sharon.palawandram · September 22, 2022, 6:32pm

I see what you’re saying. UiPath gives document level overall confidence percentages in extraction and classification.

If you need page level confidence levels you will have to split the document beforehand. Intelligent Keyword classifier, classifies a document as you define it in taxonomy. Now you have the option to split it in present classification station, but if you need page level metrices, you will have to send pages individually.

Camila_Caldas · September 22, 2022, 6:56pm

very insightful! It came to me as a possibility to split the file before the keyword classifier. I really dont know how it is going to influence the machine learning model, but its something Im going to test for sure! Im going to wait a little for other comments to see if someone has a different perspective!

sharon.palawandram · September 22, 2022, 6:59pm

ofcourse. How have you trained filed in ML extractor? were they full documents or split?

Camila_Caldas · September 22, 2022, 7:06pm

I trained with full documents.

Another idea that I want to test is the keyword classifier trainer. Im going to check how is the learning of this tool.

https://docs.uipath.com/activities/docs/intelligent-keyword-classifier-trainer

sharon.palawandram · September 22, 2022, 7:07pm

Awesome, If you trained the ML model with full documents, it might not extract single page documents unless you retrain them.

Topic		Replies	Views
Document understanding extraction Document Understanding	5	82	November 26, 2024
How to classify only the required page( having purchase order details) and send it to AI Center in Document understanding section Document Understanding document_understanding , intelligent-keyword-classifier , pdf-split	4	646	May 25, 2023
Problem with classification, Intelligent keyword classifier is splitting my pdf when there is more than 1 page Document Understanding activities , question , document_understanding	2	1171	August 12, 2022
Machine Learning Classifier not classifying page-by-page Document Understanding	5	1570	April 4, 2022
Classification Results dividing one document into multiple documents based on Pages Document Understanding	4	1682	February 8, 2023

DU License consuming - limit for pages with extracted information

Related topics