How to use Splitting option from the intelligent keyword classifier activity

Smitesh_Aher1 · August 24, 2023, 1:55pm

Hi Team,

I am beginner in the UiPath. I have one large pdf and inside that multiple pdf are present. So, i want to split the inside pdf with help of unique text not by page number through UiPath automation.

Currently in the Intelligent key word classifier the splitting option is present but, i don’t have idea about how to use this splitting option. Anyone have idea please let me know.

postwick · August 24, 2023, 2:11pm

https://docs.uipath.com/activities/other/latest/document-understanding/intelligent-keyword-classifier

Smitesh_Aher1 · August 24, 2023, 2:18pm

Hi @postwick

I saw already this. Where the splitting activity is mention in this intelligent keyword classifier? Can you please show me.

postwick · August 24, 2023, 2:23pm

It works the same way as if you use a Classification Action in Action Center. The output of the automatic classification activity gives you StartPage and PageCount in the results which you then use to split the document up.

DocumentBounds - Information on what part of the document the classification pertains to, with StartPage (Int32, 0-based), PageCount (Int32), TextStartIndex (Int32, 0-based), TextLength (Int32).

https://docs.uipath.com/activities/other/latest/document-understanding/classify-document-scope

Smitesh_Aher1 · August 24, 2023, 2:36pm

Not understood. Can you please share me sample flow?

postwick · August 24, 2023, 2:43pm

Have you built a taxonomy? Added the Document Classification activity along with the Intelligent Keyword Classifier and gone through the training steps? If so, then when you run your process the classification activity will output an object you loop through and use the data in it to split your file.

Read Taxonomy File (or just use Load Taxonomy if you’ve built it using the Taxonomy Manager in Studio)

Digitize Document:

Classify Document:

Use the Manage Learning and Configure Classifiers links to train it.

Loop through the results of the classification:

Split the file (for us it’s PDF):

Range: (item.DocumentBounds.StartPage + 1).ToString + “-” + (item.DocumentBounds.StartPage + item.DocumentBounds.PageCount).ToString

Smitesh_Aher1 · August 24, 2023, 2:56pm

@postwick i have created flow like below,

1)Taxomony
2)Digitize document
3)classify document

You have added in the for loop - validatedClassificationResult(I am not using this so, instead of this validatedClassificationResult can we use output of classify document scope?)

postwick · August 24, 2023, 2:57pm

Yes it’s the same object.

Smitesh_Aher1 · August 24, 2023, 3:02pm

split

In the Above image split property present in intelligent keyword classifier.

So can you please elaborate this?

Smitesh_Aher1 · August 24, 2023, 3:08pm

One more point @postwick … I want to split the pdf with unique text but you have not mention that in your flow. then how the pdf will split?

Can we use if condition inside the for loop?

postwick · August 24, 2023, 3:09pm

I don’t know, I’ve never used the intelligent keyword classifier, only the keyword classifier. I suspect that just tells it to output the start page etc data in the object.

postwick · August 24, 2023, 3:10pm

The For Each is where I split it, using the StartPage and PageCount values. I create the filename based on the information in the classification object (taxonomy).

Smitesh_Aher1 · August 24, 2023, 3:15pm

Okay. i will try splitting of pdf by using only keyword classifier.

Smitesh_Aher1 · August 24, 2023, 3:21pm

I create the filename based on the information in the classification object (taxonomy).
---->So, in my case i have to store that unique text inside the ‘Read text file - taxonomy’?
am i right?

postwick · August 24, 2023, 3:23pm

Have you built your taxonomy using the Taxonomy Manager?

It’s a big button at the top of the Studio window near where Table Extraction is.

After building the taxonomy we are copying it to a folder external to the project so we can update it without republishing. You don’t have to do that, you can just create your taxonomy in Taxonomy manager and it stores it in taxonomy.json in your project folder, which you can just load with the Load Taxonomy activity.

Smitesh_Aher1 · August 24, 2023, 3:43pm

Yes already i have created the taxonomy.

this is last question from my side please clear this,
@postwick please clear my one doubt. suppose, take my case :- i want to split the pdf on the basis of unique text header. so, tell me we have to train the bot or need to do other anything?

postwick · August 24, 2023, 3:45pm

You do that in the Configure Classifiers and Manage Learning sections of the classifier activity. That’s where you tell it which document types (from the taxonomy) to turn on automatic classification (Configure Classifiers) and then you use Manage Learning to input the keywords to look for for each document type.

Smitesh_Aher1 · August 25, 2023, 2:58pm

Hey @postwick

i created flow but while running the code it showing me below error in extract pdf range activity,

Extract pdf range: The range activity does not have valid argument.

Why it showing? please let me know.

postwick · August 25, 2023, 3:05pm

Post a screenshot of the Extract PDF Range and also post the expressions you have in each property.

Smitesh_Aher1 · August 25, 2023, 3:21pm

Okay.
range
Extract_pdf

Topic		Replies	Views
Is it possible to split the document by using ml classifier - Document understanding Studio studio , question , new_feature_request	21	1618	August 23, 2023
Problem with classification, Intelligent keyword classifier is splitting my pdf when there is more than 1 page Document Understanding activities , question , document_understanding	2	1184	August 12, 2022
Document Understanding: Splitting in Classic project AI Center question , document_understanding , ai_center , classic-project , splitting	5	160	June 20, 2024
Document Understanding: Document Splitting and Other Wonderful Stories :) Document Understanding	65	11536	January 15, 2022
Automation Cloud Document Understanding page based classification Document Understanding	5	113	January 20, 2025

How to use Splitting option from the intelligent keyword classifier activity

Related topics