PDF - Merge and Extract Pages (Split) activities

mihai.pricochi · August 8, 2017, 1:45pm

Activities for Merge PDFs and Extract Pages from PDF, both the nuget package and the source code. A client asked about them and I found them easy enough to implement, I am using the iText library to handle PDF files.

For Merge PDFs the input should be an array of strings (path to each PDF) you can add as many as you want.

For Extract Pages the input should be the pdf path as string, you can add a password if necessary, and the range is a string in the form “1,4-5.3.6-7” (so groups separated by comma, and ranges separated by dash), and the output will be a file for each group the name will be “inputFileName_pageGroup.pdf” so that it is easily recognizable what set of pages each output file contains.

UiPath.PDF.Extensions.Activities.zip (3.5 MB)
UiPath.PDF.Extensions.Activities.1.0.0.7.nupkg (1.3 MB)

Kingfisher · August 10, 2017, 4:17pm

Hi Mihai-san,
I like this feature! I believe this will help customers dealing with multiple PDF files.
BTW, is it possible to rotate page(s) and output as new PDF? One of customer showed us multiple pages of PDF which they want to OCR and use extracted text as input of automation. The problem is, both portrait pages and rotated landscape (90 or 270) pages are mixed up in one file, which means OCR always fails at rotated page. Some scan app like shown below has auto rotation feature, but OCR engine like Abbyy itself don’t have a such useful thing. If we could specify rotation for each page as pre-processing before OCR, we can have correct text out from PDF.
Could you please share your thought on my question?

Thank you,
Jay

Kingfisher · August 10, 2017, 4:22pm

Automatic rotation of pages via ABBYY FineReader Engine

mihai.pricochi · August 11, 2017, 4:29am

I will have a look at that. As far as I know ABBYY Engine is not free, but the iText library should offer some options.

I can do it similarly to Extract Pages, you will have to enter the page or range of pages to rotate and the amount of rotation. So the client will have to gather this information, as we can’t really identity the correct orientation, at least not with the tools I am using right now

Kingfisher · August 11, 2017, 7:10am

Mihai-san,

Thank for your comment.

you will have to enter the page or range of pages to rotate and the amount of rotation. So the client will have to gather this information

It is correct. Customer will have to input amount of rotation for each page. It sounds like dirty work, but I guess it is acceptable. PDF form which the client showed me is series of application form composed from let’s say 3 pages where their orientations are 1.portait, 2.rotated portrait, 3.portrait. The client need to deal with various types of application forms come from many companies, but they said page order and layout including rotation are same in each company.

I am now visiting Bucharest office, so I will make sample rotated PDF document as I go back to Tokyo office.

andrzej.kniola · August 11, 2017, 7:17am

iText should be able to check the orientation. See:

Not sure it can distinguish between 90 and 270 rotated portrait, though.

Kingfisher · August 11, 2017, 8:36am

@andrzej.kniola,

Hi. Thanks for your information. I am not sure iText works, but I will check. I would like to know if it retrieves orientation based on OCR result or meta data of PDF page. I am assuming this kind of PDF.

cv88480vc · November 13, 2018, 7:26pm

Hi iam using this custom activity but its working in my local studio with out any issues but when i run the bot from orchestrator its not working

ClaytonM · November 13, 2018, 7:39pm

@cv88480vc

I’m assuming you are getting the “Cannot create activity type” or whatever the error says.
I have seen this and you will probably need to edit the NuGet.config file located in “C:\Program Files (x86)\UiPath\Studio”

And add the Community Feed.

Do this on the machine that will be running the job from Orchestrator, which will allow the activities to be seen on the machine.

Hope that helps solve it.

Regards.

ClaytonM · November 13, 2018, 7:52pm

Oh, and install the activity package from the Community Feed rather than from your own Custom Feed. The Community path is shown in the previous post… First uninstall the current version and on your development machine just go to Manage Packages and Right Click on Available, then choose Configure Sources, and add the path (if it isn’t already there). Install the PDF.Extensions from there, then Publish your workflow. You still need to set up the NuGet.config on the Robot machine like my previous post. (you may need to use your Server Admin if you don’t have the access to edit the file)

Additionally, changing NuGet.config may require you to restart the UiRobotSvc which will also stop any jobs currently running, so be careful.

Regards.

cv88480vc · November 14, 2018, 5:35am

brooooo its really helped me a lot

Saurabh112 · December 5, 2018, 8:13am

I got below mentioned error, when i am using Merge PDF activity,
I have used 2 PDF source file and want to merge in 1 PDF file.

plz help

Capture

rashi.bajpai · December 21, 2018, 7:24am

HI,
I wanted to know how exactly to import these files in the studio.
Regards,
Rashi

sanjay21051990 · March 18, 2019, 10:18am

hi @rashi.bajpai,

give a try on these using invoke powershell activity for mergeing the pdf’s

Regards
Sanjay

noufalahammed · March 29, 2019, 7:07am

Thanx Mahn. It was helpful in Merging large set pdfs…

mohammedamaan · July 15, 2019, 11:18am

@mihai.pricochi
What if I need to extract all pages. Do I still need to know the range. What will be my input in that case

Paramita.Pradhan · July 15, 2019, 1:42pm

you can create your dynamic range via a loop. First generate the range as a string like ‘1,2,3,4…,n’ by incrementing through a loop. And then in split pdf activity , use the string variable as Input Range.

mohammedamaan · July 15, 2019, 1:45pm

But how will we know when to stop the loop.

Paramita.Pradhan · July 15, 2019, 2:17pm

I have used pdf activities by Nitin Safaya v1.0.2 to get the total page no of pdf, and then run the loop till that page no.

mohammedamaan · July 15, 2019, 4:09pm

Thanks. It worked for me.

Topic		Replies	Views
How to split pdf pages and extract? Help pdf , activities , question	4	17303	September 25, 2020
Verify PDF File is Portrait or Landscape in Adobe Reader DC Help pdf	2	2159	February 7, 2020
How to read a multipage scanned PDF file with multiple orientations Activities pdf , question	1	37	October 9, 2024
How to configure Merge PDFs activities? Activities activities , question , document_understanding	2	269	December 15, 2023
UiPathTeam.PDF.Extensions.Activities Community Feed	21	9386	August 23, 2018

PDF - Merge and Extract Pages (Split) activities

Related topics