Auto-Assign "Not classified" documents to MISC

I just ran a test and found the same as you, the blank page isn’t classified. It isn’t even included in the classification results. There are only 2 results in the classification results for my 3 page test document where one page is blank.

I came up with a fairly simple way to use classification results to remove the blank pages from the PDF. The only catch is then you’d have to re-digitize and re-classify the document or the classification results will no longer match the document structure.

The trick is to use the classification results to build a page range of the pages you want to keep, to pass to the Extract PDF Pages activity.

Start by looping through the classification results:

Inside the loop, check if the current page count is 1:

If it is 1, we just add the page number to our list:

(We add 1 to the StartPage property because the classification results are 0 indexed, but Extract PDF Pages is not. So page 0 in classification results is page 1 to Extract PDF Pages)

Otherwise (ie the page count is greater than 1) then build a page range:

Now we Join the list into a string:

This gives us a result like:
image

Then we pass that to Extract PDF Pages:


As far as the re-classification, here’s how I would do that.

Put everything in a Repeat Number of Times activity, set to repeat 2 times. Use Get PDF Page Count into a variable. Digitize and Classify. Loop through classification results to get page count that was classified. If PDF page count equals classified page count, then we are fine and break out of the repeat. Otherwise, build the page range, extract the PDF pages to a new document, and let the repeat happen.

In the end, the whole thing looks like…

DU Remove Blank PDF Pages.xaml (30.0 KB)

I used…

UiPath.IntelligentOCR.Activities 6.22.1
UiPath.PDF.Activities 3.20.2
UiPath.Persistence.Activities 1.5.11
UiPath.System.Activities 24.10.6
UiPath.UIAutomation.Activities 24.10.10

1 Like