Pdf automation for separating the usable page

I have pdf documents with more than 100 pages. Initially, I have Cover pages and at the end some unwanted pages, I want to use that in-between pages of pdf documents. Can you please help me how to get the usable pages separately from pdf document. suggest me the technical logics or step to follow up for complete the task efficiently.

Thanks in advance.

Manually how do you identify the these are required pages

if you don’t specify on what basis you want to bifurcate the pages from total it will be difficult to provide a logic

It will be helpful if you provide a sample pdf with expected output

Regards

Get the pdf page count using UiPath pdf activities. Then you can loop through the pages, read page, use regex or string manipulations to check if its a valid page, if the page is valid, split that page and save to a folder. Once iteration is completed, you will get the valid pages in a folder. Now read the folder (using directory.getfiles) and merge all the files using pdf activities.

I have a Key value for the required pages. Key value will be “LIFE TECHNOLOGIES”.This word is present in the top of every required pages.

then follow the above approach provided by @Anas-p-v

Regards

@Iswarya_P1

A little change to the approach mentioned by @Anas-p-v

Extract pdf supports multiple pges at same time…so no need to extract each page and then join again…

Instead get all the page numbers needed and concatenate them with comma and pass it to extract pdf activity

Hope this helps

Cheers

can you give me the step by step activities usage guidance to complete.

@Iswarya_P1

Follow the steps

Initialieze a variable str of string type with string.Empty

  1. Get pdf page count and store in a variable
  2. Use for loop with Enumerable.Range(1,PageCount).ToArray and change type argument to integer
  3. Inside use use read pdf text and give the page number as currentitem…
  4. If condition with outputofpdfread.Contains(“TextTosearch”)
  5. On then side use str = If(str.Equals(string.Empty),"",",") + currentitem.ToString
  6. After the loop use extract pdf range and pass the str as range and it would extract all the required pages to one pdf

Cheers

I got an error in that str = If(str.Equals(string.Empty),“”,“,”) + currentitem.ToString assign acitivity. can you explain that line.

@Iswarya_P1

This like is to append page numbers with comma separation …for the first time we should not add comma …so check if string is empty then append page number …else append a comma and page number

Small change …I forgot to include this

If(str.Equals(string.Empty),"",str + ",") + currentitem.ToString

Cheers

@Iswarya_P1

Please check this…all of your workflow is working perfect…the only step you missed is initializing str

image

Happy Automation

BlankProcess30 (2).zip (1.5 MB)

cheers

Thank you for your support. @Anil_G

1 Like

@Iswarya_P1

Happy Automation

Cheers

For example, let’s assume that the pdf documents have 100 pages, the last 30 pages are strike out.(total Strike out pages may vary for each documents) In these case, How to identify the strike out pages in the pdf documents and get the remaining pages from the documents for pdf automation? @Anil_G

@Iswarya_P1

Please open a separate topic for this as this is not related to the current one…

This helps in segregating the issues one for one topic

You can close this is the current issue is resolved

It would be a good idea if you can attach a sample also in the new thread you create

cheers

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.