PDF - Merge and Extract Pages (Split) activities

pdf
activities
i_considering

#1

Activities for Merge PDFs and Extract Pages from PDF, both the nuget package and the source code. A client asked about them and I found them easy enough to implement, I am using the iText library to handle PDF files.

For Merge PDFs the input should be an array of strings (path to each PDF) you can add as many as you want.

For Extract Pages the input should be the pdf path as string, you can add a password if necessary, and the range is a string in the form “1,4-5.3.6-7” (so groups separated by comma, and ranges separated by dash), and the output will be a file for each group the name will be “inputFileName_pageGroup.pdf” so that it is easily recognizable what set of pages each output file contains.

UiPath.PDF.Extensions.Activities.zip (3.5 MB)
UiPath.PDF.Extensions.Activities.1.0.0.7.nupkg (1.3 MB)


#2

Hi Mihai-san,
I like this feature! I believe this will help customers dealing with multiple PDF files.
BTW, is it possible to rotate page(s) and output as new PDF? One of customer showed us multiple pages of PDF which they want to OCR and use extracted text as input of automation. The problem is, both portrait pages and rotated landscape (90 or 270) pages are mixed up in one file, which means OCR always fails at rotated page. Some scan app like shown below has auto rotation feature, but OCR engine like Abbyy itself don’t have a such useful thing. If we could specify rotation for each page as pre-processing before OCR, we can have correct text out from PDF.
Could you please share your thought on my question?

Thank you,
Jay


#3

Automatic rotation of pages via ABBYY FineReader Engine
http://forum.ocrsdk.com/thread/automatic-rotation-of-pages-via-abbyy-finereader-engine/


#4

I will have a look at that. As far as I know ABBYY Engine is not free, but the iText library should offer some options.

I can do it similarly to Extract Pages, you will have to enter the page or range of pages to rotate and the amount of rotation. So the client will have to gather this information, as we can’t really identity the correct orientation, at least not with the tools I am using right now


How to merge pdf files
#5

Mihai-san,

Thank for your comment.

you will have to enter the page or range of pages to rotate and the amount of rotation. So the client will have to gather this information

It is correct. Customer will have to input amount of rotation for each page. It sounds like dirty work, but I guess it is acceptable. PDF form which the client showed me is series of application form composed from let’s say 3 pages where their orientations are 1.portait, 2.rotated portrait, 3.portrait. The client need to deal with various types of application forms come from many companies, but they said page order and layout including rotation are same in each company.

I am now visiting Bucharest office, so I will make sample rotated PDF document as I go back to Tokyo office.


#6

iText should be able to check the orientation. See:

Not sure it can distinguish between 90 and 270 rotated portrait, though.


#7

@andrzej.kniola,

Hi. Thanks for your information. I am not sure iText works, but I will check. I would like to know if it retrieves orientation based on OCR result or meta data of PDF page. I am assuming this kind of PDF.


How to merge PDF Files using Merge Pdf Files