How to use Splitting option from the intelligent keyword classifier activity

Document_type:- path of original pdf file.

itemFileName:- path.extension(Document_type)

Output_PdfPath:- after splitting pdf path.

Put a breakpoint on the Extract PDF Range activity, run in Debug mode. When it stops on that activity, look in the panel on the left. Look at the output of your classification activity and see if it properly found the classifications, StartPage etc

please check below ss,

output

Again, put a breakpoint on the PDF activity and run in Debug mode.

When it stops, look at your classification output object (the one you’re looping through) and see what’s in it.

When it stops, look at your classification output object (the one you’re looping through) and see what’s in it.-----> Not understood this point. can you please elaborate this?

look at your classification output object (the one you’re looping through) —>should i use writeline for the classification output object? am i right?

Hi @Smitesh_Aher1

I have doubt why are you splitting the pdf in classifier step, because what ever your having pdf and your saying its a large pdf then before sending to digitize you need split because the digitize steps it will take lots of time to read right…

I have done this way only so i was just suggesting to you split pdf before it will to digitize step

If you need to know more for splitting let me know…

Cheers…!

Yes you are right @Praveen_Mudhiraj

my requirement is to split the internal pdf on basis of unique text from the large pdf.

So can you please share me the flow for better understanding?

Good then, if it is possible can you share the dummy data so then i can prepare the flow sample

@Smitesh_Aher1

it’s a confidential document.

@Praveen_Mudhiraj can you please tell me one thing? how you split the pdf means by using any classifier or by using any another activity?

Ok i will tell the process then you can do the same way

@Smitesh_Aher1

Yes Please @Praveen_Mudhiraj

No. Right click the PDF activity and select Toggle breakpoint. Now run your process with the DEBUG button, not the Run button. It will pause at the breakpoint and you will see a panel on the left showing all your variables. Expand the variable you used in the output of the classification activity and you can look at the variable’s contents.

You can’t split it before you digitize it. You have to digitize it and pass that info to the classifier activity.

No. Right click the PDF activity and select Toggle breakpoint. Now run your process with the DEBUG button, not the Run button. It will pause at the breakpoint and you will see a panel on the left showing all your variables. ----> please check below left side image is it right
output

Expand the variable you used in the output of the classification activity and you can look at the variable’s contents. ----> You are saying about check output of classify document scope? right

1.Take get pdf page count activity and create a variable suppose =pdfpagecount

2.Take assign activity and
Create a variable= startpage and value =0
3.Take assign activity create variable
= endpage and value = 0
4…Take for each activity and pass condition like this = Enumerable. Range(1.pdfpagecount)

In loop:
Take read psf activity and create a variable= readpdf_Text
In loop take elsif activity and pass the condition like this
readpdf_Text.contains(“startpage matching text”)
Then section
Assign activity :
startpage = forech current item
Break
And again else if and pass condition

readpdf_Text.contains(“endpage matching text”)
Assign activity :
endpage = forech current item
Break

5.out of loop take else if activity and pass
Not startpage=0
Then section
Take assign activity
Startpage = startpage+3 ( number you can pass
For suppos a text is matching in first page so you need extract continues of 3 pages then give 3)

Same way take else if again and
endpage = endpage-3 ( number you can pass
For suppos a text is matching in end page so you need extract continues of 3 pages then give 3)

Out off loop
At the end you can take the extract pdf range activity and give the path and filepath and provide folder path

In the range you can give like this… startpage. Tostring+“-”+endpage

It will split and add the pages to one folder then you can pass it in digitize in loop
@postwick

Its possible to split the pages based on conditions before going pdf to digitization

@postwick

Thanks @Praveen_Mudhiraj for providing solution.

But, i don’t know range of page means last page of split. so how i can use this number (like as you take 3),
Startpage = startpage+3 ( number you can pass
For suppos a text is matching in first page so you need extract continues of 3 pages then give 3)

how?

You are not following what I’m telling you to do. You are either clicking Run instead of Debug, or you are not setting a breakpoint on the PDF activity.

He can’t do it this way, he is using document understanding to figure out the page ranges because they aren’t always the same.