I am beginner in the UiPath. I have one large pdf and inside that multiple pdf are present. So, i want to split the inside pdf with help of unique text not by page number through UiPath automation.
Currently in the Intelligent key word classifier the splitting option is present but, i don’t have idea about how to use this splitting option. Anyone have idea please let me know.
It works the same way as if you use a Classification Action in Action Center. The output of the automatic classification activity gives you StartPage and PageCount in the results which you then use to split the document up.
DocumentBounds - Information on what part of the document the classification pertains to, with StartPage (Int32, 0-based), PageCount (Int32), TextStartIndex (Int32, 0-based), TextLength (Int32).
Have you built a taxonomy? Added the Document Classification activity along with the Intelligent Keyword Classifier and gone through the training steps? If so, then when you run your process the classification activity will output an object you loop through and use the data in it to split your file.
Read Taxonomy File (or just use Load Taxonomy if you’ve built it using the Taxonomy Manager in Studio)
Digitize Document:
Classify Document:
Use the Manage Learning and Configure Classifiers links to train it.
Loop through the results of the classification:
Split the file (for us it’s PDF):
Range: (item.DocumentBounds.StartPage + 1).ToString + “-” + (item.DocumentBounds.StartPage + item.DocumentBounds.PageCount).ToString
You have added in the for loop - validatedClassificationResult(I am not using this so, instead of this validatedClassificationResult can we use output of classify document scope?)
I don’t know, I’ve never used the intelligent keyword classifier, only the keyword classifier. I suspect that just tells it to output the start page etc data in the object.
The For Each is where I split it, using the StartPage and PageCount values. I create the filename based on the information in the classification object (taxonomy).
I create the filename based on the information in the classification object (taxonomy).
---->So, in my case i have to store that unique text inside the ‘Read text file - taxonomy’?
am i right?
Have you built your taxonomy using the Taxonomy Manager?
It’s a big button at the top of the Studio window near where Table Extraction is.
After building the taxonomy we are copying it to a folder external to the project so we can update it without republishing. You don’t have to do that, you can just create your taxonomy in Taxonomy manager and it stores it in taxonomy.json in your project folder, which you can just load with the Load Taxonomy activity.
this is last question from my side please clear this, @postwick please clear my one doubt. suppose, take my case :- i want to split the pdf on the basis of unique text header. so, tell me we have to train the bot or need to do other anything?
You do that in the Configure Classifiers and Manage Learning sections of the classifier activity. That’s where you tell it which document types (from the taxonomy) to turn on automatic classification (Configure Classifiers) and then you use Manage Learning to input the keywords to look for for each document type.