Document Understanding: Document Splitting and Other Wonderful Stories :)

Hello @brasheva,

we currently do not support processing of documents with out of order pages. This is a feature on our roadmap though and will be working on it. Until then, the supported scenarios are correct page orders.

on skewing and rotation of pages - this should be handled by each and every classifier and extractor, if you are digitizing your file with a decent OCR engine that knows how to detect page rotation as well.

I am working with a healthcare agency that is receiving applications for foster parents. Each application is being sent to them as a large PDF. These PDF’s contain dozens of documents associated with the foster care application. A case worker has to manually split these documents into individual files and post each document to the state foster care system. These PDFs are typically over 250 pages in length and we receive 10 of these each month (approximately 2500 total pages per month). There are approximately three dozen document types.

We have built a bot which digitizes the foster care application, classifies and splits the documents contained within the application PDF, then stores the individual files for subsequent posting to the state system. We are using Omni page to digitize the document and intelligent keyword classification to classify the individual documents. Could someone provide an estimate of what this would cost to run? Any recommendations on minimizing costs would also be greatly be appreciated.

In other words, is there a per page cost for digitizing and splitting each page of a non-native PDF? and is there any way to keep these charges to a minimum? Thanks!

Hello @kreigfields,

This is a very interesting use case!

The cost does not depend on whether the documents are native pdfs or scanned pdfs, and there is no cost for digitizing them if you are using a freely available OCR engine like OmniPage OCR.

There is a cost associated with the Intelligent Keyword Classifier, which depends on the length of the document. Please reach out to your UiPath contact for Intelligent Keyword Classifier pricing.

As you can imagine these adoption agencies tend to be understaffed and underfunded. I’m working on several use cases to try to help. This is one of the simpler use cases. Thanks again for your insights.

1 Like

I have used both Intelligent and machine learning classifier and wants to know whether the ML Classifier support document splitting ?
@Ioana_Gligan please help.

1 Like