Splitting a pdf document

I have a pdf document. Of 10 pages for example.
On every page is a certain text-string, typically on 2 pages the same string.
The task at hand: Split the pdf in several pdf documents, where the 2 pages with the same string need to be in the same file.
As an extra: The string to look for is in an Excel sheet (for each row) and if there are pages that have a string that is not in that Excel sheet, they don’t need to be saved as a seperate document.
I found an example (Need to read a word in PDF file and if that word exists should remove that page and save the other pages - #7 by prasath17), installed BalaReva.Pdf.Activities. But I don’t see the “For Each” loop action to add (I have StudioX, not Studio)

Hi @sraar.jans-beken - Recently(few weeks back) helped a member on the similar request, where the text to look for in the pdf say “invoice” …if it found on page 3, 5, 7 then i splitted the pdfs into 4 parts.

Page 1-2, 3-4, 5-6, 7-10 like this. I buit this string with this value (1-2, 3-4, 5-6, 7-10) and then finally passed to pdf splitter (BalaReva) …

But I am confused about your case, could you please brief with some example and possibly share the screenshot of the excel file?

But I am not sure, how to do this in StudioX…but we can try…

1 Like

First of all thanks for your answer.

For now, please ignore the Excel part of my question. I added an example PDF. As you will notice page 1 & 2 both have the same text string (ABC123). The same for page 3 & 4 (DEF456), and do on.

Task at hand: Go through the pdf pages, and create a new pdf for every page with text string DEF456 (being page 2 & 3).

When done, the original document can be deleted, and a 2-page document should be saved.

Example.pdf (57.5 KB)

@sraar.jans-beken - Please check this workflow…Split_PDF.zip (361.9 KB)

you can delete the files from the extracted and Merged folder and then try running the workflow, you will pdf pages with DEF456 splitted first and then merged.

Note: Only downside of this approach is the size. if you notice the size of the merged pdf
is greater than the original pdf size.

Hope this helps…

The issue is that I don’t have a “for each item” control in StudioX

@sraar.jans-beken - what version of UiPath you are using ?

Version:

StudioX 2020.10.4

Enterprise License

Windows Installer

@sraar.jans-beken - See if you can update …

or if update is not possible I will have to think on how to do this with out for each item…

I just tried to downgrade the system activities to 20.10.4

I could see the Repeat No of Items in the common tab…please check