How to split pdf document into separate page

Hi forum,

I have one requirement where I have many pdf files and I wanted to check if it has multiple pages in it. If so, then I wanted to split them into different files.

How do I do this in UiPath? Please help…

@Akshaya89,

Follow this solution. Just change the page threshold as per your requirement.

Hi @Akshaya89

Can you try the below

Sequence21.xaml (17.8 KB)

Regards,

@lrtetala
How can I split the pdf’s based on the specific keyword?

@Akshaya89 ,

You can follow this:

  1. Read the PDF File: Use the Read PDF Text or Read PDF with OCR activity to extract the text from the PDF document.

  2. Identify the Keyword: Determine the keyword based on which you want to split the PDF.

  3. Process the Text: Once you have the full text extracted, you can analyze it to find the occurrences of the keyword. You can use string manipulation functions or regular expressions to identify the positions of the keyword.

  4. Create Separate PDFs: For each section that contains the keyword, use the Split PDF activity to create a new PDF document.

  5. Save the New PDFs: Finally, output each split PDF to your desired directory.

Example Implementation

Here’s a step-by-step implementation example with some basic UiPath activities:

Prerequisites

  1. Ensure you have the UiPath PDF package installed (go to Manage Packages and search for UiPath.PDF.Activities).

Sample Workflow

  1. Read PDF Text Activity:

    • Use the Read PDF Text activity to read the PDF file and output the text to a string variable (e.g., pdfText).
  2. Find Keyword Positions:

    • Use the IndexOf method to find occurrences of your specific keyword in the pdfText.
    • Store the index positions of the keyword in a list or array.
    Dim keyword As String = "YourKeyword"
    Dim positions As New List(Of Integer)
    
    Dim index As Integer = pdfText.IndexOf(keyword)
    While index <> -1
        positions.Add(index)
        index = pdfText.IndexOf(keyword, index + keyword.Length)
    End While
    
  3. Split the Text into Sections:

    • Use the positions from the previous step to split the pdfText into sections.
    • Depending on how you want to split, you can take the text between the keywords, or you can take entire sections including the keywords.
  4. Create PDF Documents:

    • For each section obtained from the splitting process, create a new PDF file using the Write PDF activity or by creating a new PDF with the PDF Activities.
  5. Save the PDFs:

    • Use Write PDF activity or appropriate methods to save each new PDF to a desired location.

Example Code Snippet

Here’s an example of a simple logic using UiPath activities. You may need to adapt it depending on your specific requirements:

' Assuming you already read the pdfText into a Data Variable called pdfText
Dim keyword As String = "YourKeyword"
Dim sections As New List(Of String)

Dim positions As List(Of Integer) = GetKeywordPositions(pdfText, keyword) ' Function to get positions
For i As Integer = 0 To positions.Count - 1
    Dim start As Integer = positions(i)
    Dim end As Integer = If(i + 1 < positions.Count, positions(i + 1), pdfText.Length)
    sections.Add(pdfText.Substring(start, end - start))
Next

' Now create PDF for each section
For Each section In sections
    ' Use Write PDF activity here with each section
Next