How to extract all pages of a PDF based on a specific Text?

msan · May 11, 2020, 6:33am

Hello,

One approach sticking with UiPath activities would be to:

Below you’ll find a skeleton with emphases on some key points.

Assign (Int32)
Initialise a counter (the page number)
page = 0

Assign (List of String)
A list of string were we’ll stack the matching page numbers
pages = New List(Of String)

Get PDF Page Count
Get the number of page in the document
lastPage

While (page < lastPage)

Assign
Increment the counter
page = page + 1
Read PDF Text
With range set to page, you get a string
pageText
If System.Text.RegularExpressions.Regex.isMatch(pageText, “\bBlock \d+\b”)
If the text contains “Bock (No)”, do the following:
- Add To Collection (page.ToString to pages)
  We’re adding the page number to the list of retained pages

// We’ve reach the number of pages in the document, it’s time to generate the document

Extract PDF Page Range
Create a new document from the original passing the range argument the pages retained
range = String.Join(", ", pages)

Topic		Replies	Views
Extract PDF oages contain specific text Activities pdf , activities , question	5	1784	October 26, 2022
How can i read a specific page from an pdf file? Learning Hub	5	1572	October 25, 2024
Merge pdf page after extraction of data from a large file in Uipath Studio studio , question , activities_panel	5	1601	October 27, 2021
How to split pdf pages and extract? Help pdf , activities , question	4	17037	September 25, 2020
UiPath only able to read blocks of text in PDF instead of specific values Help uiautomation , studio	7	2605	October 24, 2019