Activities for Merge PDFs and Extract Pages from PDF, both the nuget package and the source code. A client asked about them and I found them easy enough to implement, I am using the iText library to handle PDF files.
For Merge PDFs the input should be an array of strings (path to each PDF) you can add as many as you want.
For Extract Pages the input should be the pdf path as string, you can add a password if necessary, and the range is a string in the form “1,4-5.3.6-7” (so groups separated by comma, and ranges separated by dash), and the output will be a file for each group the name will be “inputFileName_pageGroup.pdf” so that it is easily recognizable what set of pages each output file contains.
Hi Mihai-san,
I like this feature! I believe this will help customers dealing with multiple PDF files.
BTW, is it possible to rotate page(s) and output as new PDF? One of customer showed us multiple pages of PDF which they want to OCR and use extracted text as input of automation. The problem is, both portrait pages and rotated landscape (90 or 270) pages are mixed up in one file, which means OCR always fails at rotated page. Some scan app like shown below has auto rotation feature, but OCR engine like Abbyy itself don’t have a such useful thing. If we could specify rotation for each page as pre-processing before OCR, we can have correct text out from PDF.
Could you please share your thought on my question?
I will have a look at that. As far as I know ABBYY Engine is not free, but the iText library should offer some options.
I can do it similarly to Extract Pages, you will have to enter the page or range of pages to rotate and the amount of rotation. So the client will have to gather this information, as we can’t really identity the correct orientation, at least not with the tools I am using right now
you will have to enter the page or range of pages to rotate and the amount of rotation. So the client will have to gather this information
It is correct. Customer will have to input amount of rotation for each page. It sounds like dirty work, but I guess it is acceptable. PDF form which the client showed me is series of application form composed from let’s say 3 pages where their orientations are 1.portait, 2.rotated portrait, 3.portrait. The client need to deal with various types of application forms come from many companies, but they said page order and layout including rotation are same in each company.
I am now visiting Bucharest office, so I will make sample rotated PDF document as I go back to Tokyo office.
Hi. Thanks for your information. I am not sure iText works, but I will check. I would like to know if it retrieves orientation based on OCR result or meta data of PDF page. I am assuming this kind of PDF.
I’m assuming you are getting the “Cannot create activity type” or whatever the error says.
I have seen this and you will probably need to edit the NuGet.config file located in “C:\Program Files (x86)\UiPath\Studio”
And add the Community Feed.
Do this on the machine that will be running the job from Orchestrator, which will allow the activities to be seen on the machine.
Oh, and install the activity package from the Community Feed rather than from your own Custom Feed. The Community path is shown in the previous post… First uninstall the current version and on your development machine just go to Manage Packages and Right Click on Available, then choose Configure Sources, and add the path (if it isn’t already there). Install the PDF.Extensions from there, then Publish your workflow. You still need to set up the NuGet.config on the Robot machine like my previous post. (you may need to use your Server Admin if you don’t have the access to edit the file)
Additionally, changing NuGet.config may require you to restart the UiRobotSvc which will also stop any jobs currently running, so be careful.
you can create your dynamic range via a loop. First generate the range as a string like ‘1,2,3,4…,n’ by incrementing through a loop. And then in split pdf activity , use the string variable as Input Range.