Splitting multi-page PDF document containing multiple document types

The speed of digitization depends on what OCR engine you use. It’s pretty straightforward that you might see a decent time spent to digitize 90+ pages, but you have a small window to increase the speed of digitization, through the OCR engine used, if it’s in the server you can increase the memory allocated for it, GPU etc.

Another approach on speeding up the digitization process.

I dont see a problem here as long as you have trained them they should be fine.

You can improve the accuracy of the model over time by using an intelligent keyword classifier trainer, this will retrain the model with human in the loop validated data.

1 Like