PDF - Merge and Extract Pages (Split) activities

Activities for Merge PDFs and Extract Pages from PDF, both the nuget package and the source code. A client asked about them and I found them easy enough to implement, I am using the iText library to handle PDF files.

For Merge PDFs the input should be an array of strings (path to each PDF) you can add as many as you want.

For Extract Pages the input should be the pdf path as string, you can add a password if necessary, and the range is a string in the form “1,4-5.3.6-7” (so groups separated by comma, and ranges separated by dash), and the output will be a file for each group the name will be “inputFileName_pageGroup.pdf” so that it is easily recognizable what set of pages each output file contains.

UiPath.PDF.Extensions.Activities.zip (3.5 MB)
UiPath.PDF.Extensions.Activities.1.0.0.7.nupkg (1.3 MB)

3 Likes

Hi Mihai-san,
I like this feature! I believe this will help customers dealing with multiple PDF files.
BTW, is it possible to rotate page(s) and output as new PDF? One of customer showed us multiple pages of PDF which they want to OCR and use extracted text as input of automation. The problem is, both portrait pages and rotated landscape (90 or 270) pages are mixed up in one file, which means OCR always fails at rotated page. Some scan app like shown below has auto rotation feature, but OCR engine like Abbyy itself don’t have a such useful thing. If we could specify rotation for each page as pre-processing before OCR, we can have correct text out from PDF.
Could you please share your thought on my question?

Thank you,
Jay

Automatic rotation of pages via ABBYY FineReader Engine

1 Like

I will have a look at that. As far as I know ABBYY Engine is not free, but the iText library should offer some options.

I can do it similarly to Extract Pages, you will have to enter the page or range of pages to rotate and the amount of rotation. So the client will have to gather this information, as we can’t really identity the correct orientation, at least not with the tools I am using right now

2 Likes

Mihai-san,

Thank for your comment.

you will have to enter the page or range of pages to rotate and the amount of rotation. So the client will have to gather this information

It is correct. Customer will have to input amount of rotation for each page. It sounds like dirty work, but I guess it is acceptable. PDF form which the client showed me is series of application form composed from let’s say 3 pages where their orientations are 1.portait, 2.rotated portrait, 3.portrait. The client need to deal with various types of application forms come from many companies, but they said page order and layout including rotation are same in each company.

I am now visiting Bucharest office, so I will make sample rotated PDF document as I go back to Tokyo office.

1 Like

iText should be able to check the orientation. See:
https://stackoverflow.com/questions/6570654/getting-the-orientation-of-a-pdf-that-has-been-read-in-through-itext

Not sure it can distinguish between 90 and 270 rotated portrait, though.

@andrzej.kniola,

Hi. Thanks for your information. I am not sure iText works, but I will check. I would like to know if it retrieves orientation based on OCR result or meta data of PDF page. I am assuming this kind of PDF.

Hi iam using this custom activity but its working in my local studio with out any issues but when i run the bot from orchestrator its not working

@cv88480vc

I’m assuming you are getting the “Cannot create activity type” or whatever the error says.
I have seen this and you will probably need to edit the NuGet.config file located in “C:\Program Files (x86)\UiPath\Studio”

And add the Community Feed.

Do this on the machine that will be running the job from Orchestrator, which will allow the activities to be seen on the machine.
image

Hope that helps solve it.

Regards.

2 Likes

Oh, and install the activity package from the Community Feed rather than from your own Custom Feed. The Community path is shown in the previous post… First uninstall the current version and on your development machine just go to Manage Packages and Right Click on Available, then choose Configure Sources, and add the path (if it isn’t already there). Install the PDF.Extensions from there, then Publish your workflow. You still need to set up the NuGet.config on the Robot machine like my previous post. (you may need to use your Server Admin if you don’t have the access to edit the file)

Additionally, changing NuGet.config may require you to restart the UiRobotSvc which will also stop any jobs currently running, so be careful.

Regards.

1 Like

brooooo its really helped me a lot

1 Like

I got below mentioned error, when i am using Merge PDF activity,
I have used 2 PDF source file and want to merge in 1 PDF file.

plz help

Capture

HI,
I wanted to know how exactly to import these files in the studio.
Regards,
Rashi

hi @rashi.bajpai,

give a try on these using invoke powershell activity for mergeing the pdf’s

Regards
Sanjay

2 Likes

Thanx Mahn. It was helpful in Merging large set pdfs…

1 Like

@mihai.pricochi
What if I need to extract all pages. Do I still need to know the range. What will be my input in that case

you can create your dynamic range via a loop. First generate the range as a string like ‘1,2,3,4…,n’ by incrementing through a loop. And then in split pdf activity , use the string variable as Input Range.

But how will we know when to stop the loop.

I have used pdf activities by Nitin Safaya v1.0.2 to get the total page no of pdf, and then run the loop till that page no.

1 Like

Thanks. It worked for me. :slight_smile: