Automation Cloud Document Understanding page based classification

goksu.avci · January 7, 2025, 10:26pm

I am planning to integrate a model developed in Automation Cloud into a process using a REST API. However, for multi-page documents, the pages should be assigned to different classes, but the classification result only returns a single class. There is an “out-of-the-box” ML model called Document Splitter, but I can’t use it since it seems to be in preview mode. Could you provide any suggestions?

tomasz.wierzbicki · January 8, 2025, 9:01am

Hi @goksu.avci,

welcome to UiP Forum

Not tried DU API yet, but check for Intelligent Keyword Classifier (for sure available as Studio Activities). For now it’s the only one that can split single document (file) into multiple pages and indeed it can assign different classes to various page ranges in that single document.

When it comes to Document Splitter - I’ve been told by UiPath that this solution will not be supported and we can expect it to be discountinued.

Cheers,
Tom

goksu.avci · January 8, 2025, 9:32pm

@Monica_Secelean Hi Monica,
I’ve seen your posts/reply about similar topics on the forum. I would appreciate it if you could inform me about this topic.

Monica_Secelean · January 19, 2025, 5:53pm

@goksu.avci we’re in the progress of adding splitting capabilities to our APIs by enabling splitting with the classification operation - would you mind telling me more about your use case? what the input PDF contains, why you need to split it, what do you do with the splitting results, what’s the business process, what are you automating, what document types do you work with?

goksu.avci · January 20, 2025, 7:31am

Hi Monica,

Thank you for your response. The process I have includes customer instructions and their attachments. There can be multiple types of instructions, and they might be sent within a single PDF. For instance, a 4-page PDF could have the following structure: ordertype1, ordertype2, attachment1, attachment2. In this case, our classification output should be as follows:

Page 1: type1

Page 2: type2

Page 3: other

Page 4: other

In the next steps, if there’s a need for an integrated system, the process will split the relevant page and call the data extraction part according to its classification. However, based on existing classification result, we are forced to limit the integrated system to single-page PDFs. I would like to perform the operation in the example without using Studio.

Monica_Secelean · January 20, 2025, 9:28pm

@goksu.avci your use case makes sense to me, thanks for sharing it! A sample document would be even greater, if you can share, but else, based on your description, I believe you would be able to leverage our feature by:

providing a sample document as input
having a classifier with a splitter trained as part of a Modern Project (details still in the work)
call the classification APIs, which also perform splitting
optionally, validate the results
iterate over the classification results and perform the extraction based on the identified document type
Let me know if the above makes sense to you or any other feedback you may have!
Monica

Topic		Replies	Views
Splitting multi-page PDF document containing multiple document types Document Understanding ocr , digitize , classify-scope	2	3011	October 4, 2022
Classifying and Extracting from PDFs with multiple pages Activities activities , question , document_understanding	8	27	May 11, 2025
Classification Results dividing one document into multiple documents based on Pages Document Understanding	4	1659	February 8, 2023
How to split a file into individual document types in Document Understanding? Activities activities , question , document_understanding , classifier , intelligent-keyword-classifier	5	726	February 17, 2024
Does the machine learning classifier not support page splitting? Document Understanding document_understanding	2	46	October 28, 2024

Automation Cloud Document Understanding page based classification

Related topics