How to use Splitting option from the intelligent keyword classifier activity

Hi Team,

I am beginner in the UiPath. I have one large pdf and inside that multiple pdf are present. So, i want to split the inside pdf with help of unique text not by page number through UiPath automation.

Currently in the Intelligent key word classifier the splitting option is present but, i don’t have idea about how to use this splitting option. Anyone have idea please let me know.

https://docs.uipath.com/activities/other/latest/document-understanding/intelligent-keyword-classifier

Hi @postwick

I saw already this. Where the splitting activity is mention in this intelligent keyword classifier? Can you please show me.

It works the same way as if you use a Classification Action in Action Center. The output of the automatic classification activity gives you StartPage and PageCount in the results which you then use to split the document up.

  • DocumentBounds - Information on what part of the document the classification pertains to, with StartPage (Int32, 0-based), PageCount (Int32), TextStartIndex (Int32, 0-based), TextLength (Int32).

https://docs.uipath.com/activities/other/latest/document-understanding/classify-document-scope

Not understood. Can you please share me sample flow?

Have you built a taxonomy? Added the Document Classification activity along with the Intelligent Keyword Classifier and gone through the training steps? If so, then when you run your process the classification activity will output an object you loop through and use the data in it to split your file.

Read Taxonomy File (or just use Load Taxonomy if you’ve built it using the Taxonomy Manager in Studio)
image

Digitize Document:
image

Classify Document:
image

Use the Manage Learning and Configure Classifiers links to train it.

Loop through the results of the classification:
image

Split the file (for us it’s PDF):
image
Range: (item.DocumentBounds.StartPage + 1).ToString + “-” + (item.DocumentBounds.StartPage + item.DocumentBounds.PageCount).ToString

@postwick i have created flow like below,

1)Taxomony
2)Digitize document
3)classify document

You have added in the for loop - validatedClassificationResult(I am not using this so, instead of this validatedClassificationResult can we use output of classify document scope?)

Yes it’s the same object.

split

In the Above image split property present in intelligent keyword classifier.

So can you please elaborate this?

One more point @postwick … I want to split the pdf with unique text but you have not mention that in your flow. then how the pdf will split?

Can we use if condition inside the for loop?

I don’t know, I’ve never used the intelligent keyword classifier, only the keyword classifier. I suspect that just tells it to output the start page etc data in the object.

The For Each is where I split it, using the StartPage and PageCount values. I create the filename based on the information in the classification object (taxonomy).

Okay. i will try splitting of pdf by using only keyword classifier.

I create the filename based on the information in the classification object (taxonomy).
---->So, in my case i have to store that unique text inside the ‘Read text file - taxonomy’?
am i right?

Have you built your taxonomy using the Taxonomy Manager?

It’s a big button at the top of the Studio window near where Table Extraction is.

After building the taxonomy we are copying it to a folder external to the project so we can update it without republishing. You don’t have to do that, you can just create your taxonomy in Taxonomy manager and it stores it in taxonomy.json in your project folder, which you can just load with the Load Taxonomy activity.

Yes already i have created the taxonomy.

this is last question from my side please clear this,
@postwick please clear my one doubt. suppose, take my case :- i want to split the pdf on the basis of unique text header. so, tell me we have to train the bot or need to do other anything?

You do that in the Configure Classifiers and Manage Learning sections of the classifier activity. That’s where you tell it which document types (from the taxonomy) to turn on automatic classification (Configure Classifiers) and then you use Manage Learning to input the keywords to look for for each document type.

Hey @postwick

i created flow but while running the code it showing me below error in extract pdf range activity,

Extract pdf range: The range activity does not have valid argument.

Why it showing? please let me know.

Post a screenshot of the Extract PDF Range and also post the expressions you have in each property.

Okay.
range
Extract_pdf