Trainable document splitter model in Modern Projects

We’re excited to announce that a new trainable document splitter and classifier model is now available in Public Preview for tenants based in Europe and the US.

This new model extends the power of document classification to multi-document packets — enabling you to split and classify documents within a single workflow.

:key: Key details

  • Availability: Currently, the feature is only available in Europe and the US. Availability for new reagions will be confirmed at a later date
  • Pricing: The new splitter and classifier model falls under the existing Modern Projects pricing model, where each page is charged 1 AI Units regardless of numbers of operations on that page. Please note that final pricing for General Availability (GA) may be subject to change.

:puzzle_piece: What it does

The new model can:

  • Classify entire documents (e.g., identify whether a file is an ID, invoice, or application form).
  • Split and classify multi-document packets — such as mortgage or loan application files containing multiple sub-documents.

For example:
A 10-page mortgage application PDF might contain:

  • Pages 1–2 → ID
  • Pages 3–6 → Application form
  • Pages 7–10 → Bank statement

The new model can be trained to automatically detect these boundaries and assign the correct document type to each section.


:gear: Getting Started

:one: Create a project

:two: Upload your documents

  • Go to Classify and Split Documents and upload your document packets.
  • Once processed, select the uploaded files and click Split to open the annotation interface.
  • If your project already has a trained model, the documents will be pre-annotated automatically — saving time and showing predictions.

:three: Define your classification taxonomy

  • Click New document type to define each document type in your taxonomy.
  • Choose from predefined types or create custom ones with:

:four: Annotate and confirm splits

  • Mark where each document starts and ends, and assign a document type to each range.
  • Click Confirm to process and generate sub-documents.
  • Each sub-document appears under its document type in the Build section and gets pre-annotated with the schema of that type.
  • You can skip non-relevant pages by labeling them as “–Unknown–.”

:five: Train your model

  • Training begins automatically once you have at least five annotated sub-documents.
  • Training status is visible in the Classification pane.

:six: Review metrics

  • Navigate to the Measure page and review model metrics

:seven: Publish the Model :rocket:

  • Once training is complete, publish your model in the Publish section to make it available for use in your automations.
  • Published models can be versioned, managed, and reused across projects.
  • The version of the new splitter and classifier model is 25.9

:eight: Consume the Model in Your Workflow or via APIs :link:

  • Use the published model directly in UiPath workflows to automatically classify or split incoming documents.
  • Currently, the new splitter and classifier model can be consumed through IntelligentOCR.Activities 6.27.0.
  • You can also access the model through APIs to integrate document processing into other systems.

:magnifying_glass_tilted_left: Reviewing Predictions

After training, all project documents are updated with predictions from the model.
You can review results by:

  • Comparing Ground Truth (Type) vs Predicted Type in the Classification table.
  • Viewing sub-documents by enabling “Include sub-documents” in the View menu.
  • Enabling “Show Prediction” in the annotation interface to see how the model performed.

:light_bulb: Classification-Only Option

If you only need classification (not splitting), simply disable the “Enable splitting” toggle.
The model will then classify whole documents as before.


:warning: Current Limitations (Public Preview)

Some limitations apply during the preview phase:

:books: Dataset

  • Minimum document types: 1
  • Minimum samples:
    • Single document type → at least 5 samples
    • Multiple types → at least 5 total documents (1 per type minimum)
  • Maximum document size: 160 MB or 500 pages
  • Training triggered after 5 annotation changes

:pen: Annotation

  • Pages cannot be reordered or deleted

:brick: Features not yet available

  • Splitting info in the Monitor page
  • Retraining for splitting/classification
  • Splitting support in for the cross-platform activities (DU.Activities)
  • Migrating splitting and classification data sets across environments

These limitations will evolve as we move toward General Availability (GA).


:speech_balloon: Share Your Feedback!

We’d love to hear your thoughts as you try the new splitter and classifier model:

  • How well does it handle your document packets?
  • What improvements would help before GA?
  • Any issues or surprises you encountered?

Your feedback will help us make the feature even better before full release.

12 Likes

Will this be available in non-modern as well?

1 Like

Hi David, no, it won’t be available in classic projects. Do you have any blockers/issues with using Modern Projects that we should be aware of?

I’m not David, but I can say the main issue with Modern Projects is how expensive they are.

From what I have seen, Modern projects cost more.

1 Like

Hi, could you clarify what you mean by saying the main issue with Modern Projects is how expensive they are?

Are you referring specifically to splitting use cases, or to Modern Projects in general?

It’s true that some use cases can be more expensive, but many are not. I’m not trying to argue for either approach — both Modern and Classic (AI Center) Projects have their own advantages and disadvantages in terms of costs.

For example, in AI Center/Classic Projects, users pay for infrastructure costs — both for training and for model serving. Classification costs are also additive, meaning that if every document needs to be classified and extracted, the total cost will exceed that of Modern Projects (>1 AI Unit per page).

In Modern Projects, however, there are no infrastructure costs, and both classification and extraction together cost 1 AI Unit per page. So, if every document requires both classification and extraction, Modern Projects will actually be cheaper. On the other hand, if only a portion of documents need extraction, then Modern Projects can end up being more expensive.

For splitting use cases, we’re currently exploring alternative pricing models and would really appreciate your input.

is there a video explaining the classification process?

when I use the project after classification, still getting single document type instead of multiple document types.

is this designed to work like intelligent keyword classifier which automatically gives different document types from a single document?