Hi all Pretty big ramble here so please bear with me We are working on a problem where we are using DU to ultimately ingest files of one type into a system, lets call these cover pages. We will be performing text extraction on these at a later stage (performer) but for now, in the dispatcher, the …

Splitting multi-page PDF document containing multiple document types

sharon.palawandram (Sharon Palawandram) October 4, 2022, 4:56pm 3

The speed of digitization depends on what OCR engine you use. It’s pretty straightforward that you might see a decent time spent to digitize 90+ pages, but you have a small window to increase the speed of digitization, through the OCR engine used, if it’s in the server you can increase the memory allocated for it, GPU etc.

Another approach on speeding up the digitization process.

I dont see a problem here as long as you have trained them they should be fine.

You can improve the accuracy of the model over time by using an intelligent keyword classifier trainer, this will retrain the model with human in the loop validated data.

1 Like

Topic		Replies	Views
Automation Cloud Document Understanding page based classification Document Understanding	5	99	January 20, 2025
DU License consuming - limit for pages with extracted information Document Understanding document_understanding	7	1050	September 22, 2022
Classifying and Extracting from PDFs with multiple pages Activities activities , question , document_understanding	8	38	May 11, 2025
How to split a file into individual document types in Document Understanding? Activities activities , question , document_understanding , classifier , intelligent-keyword-classifier	5	743	February 17, 2024
Document Understanding Licences - what are you charged for? Document Understanding document_understanding	2	2109	October 22, 2020

Splitting multi-page PDF document containing multiple document types

Related topics