BLOG: My name is Anders and I’m allergic to manual work

When we face a UiPath problem, that we can’t directly solve, our go-to approach should be to search either the UiPath Forum or Stack Overflow (VB/Pythond) for a solution (reduce work, since we don’t need to “invent” anything new).

Today I needed to split a PDF by dynamic page range and since I couldn’t find a solution, I had to create it myself (bummer :grinning:).

Do it yourself:
Even though you don’t need to split PDF’s, I can recommend doing the case, if you want a basic understanding of loops and working with files.

Case:
We have a PDF, which consist of 3 invoices. Problem: One or more of the invoices will be a 2-page invoice and we don’t know which. Sample PDF: InvoicesXYZ.pdf (59.2 KB). What we know is that our PDF’s are numbered, meaning that if it’s a one-pager, we can see a “Page 1” on the page and if it’s a two-pager, we will have a “Page 1” on the first page and a “Page 2” on the second page.

Solution Step by Step:

  1. We create an outter For Each, where we look in our project folder for merged invoices. In our case there is only one: InvoicesXYZ.pdf. Hint: Use the .NET method Directory.GetFiles(strYourProjectPath).

  2. Get PDF Page Count. In order to know when to stop, we find the total page count of our PDF and store it as an integer, intTotalPageNumber.

  3. While loop. This loop will iterate through each page of our PDF. In the end of the loop we place an assign, that will add one to our index variable called intCurrentPage. The condition of the loop will then be to run as long as intCurrentPage is less than or equal to intTotalPageNumber.

  4. Read PDF Text. We read the current page into a string variable (strTextInvoice). The range should therefore be set to intCurrentPage.

  5. Matches. We use the Matches with a simple pattern (“Page 2”) on our string variable. The result is stored into the an IEnumerable of Match (you can see this as a collection). What we did here was to do a Regex search for “Page 2” (Steven are you watching? :grinning:). This could result in either that our IEnumerable would consist of 0 or 1 element (if we are on Page 1, we will have zero elements and vice versa).

  6. If. We simply ask if our ienMatches is having more than 0 objects. If no we will extract our pdf “normally” meaning our Range will be the intCurrentPage.ToString. We use the output path strProjectPath + “\Result" + path.GetFileNameWithoutExtension(item) + intCurrentPage.ToString + “.pdf”, giving it a unique name. If yes we know that this is a two-pager and we therefore have to append this page to the previous extracted. We do this by setting the range to a range going from the previous page to the current page (”" + (intCurrentPage-1).ToString + “-” + (intCurrentPage).ToString) and overwrite the previous extraced PDF. Again we use the output syntax of strProjectPath + "\Result" + path.GetFileNameWithoutExtension(item) + (intCurrentPage-1).ToString + “.pdf”. Remind yourself that we use the previous page as name to overwrite the previously extracted.

  7. Does your pages range to more than 2 pages. Easy: Create a nested While loop and solve it trivially.

Screenshot and file:

Main.xaml (8.6 KB)

Now we can post the solution to an unsolved topic on the UiPath Forum, so others won’t have to create the work: Split Pdf into multiple ones

1 Like