How to reduce File size(.pdf files) if greater than 5MB?

Hi @Murli_Manohar,

Welcome to the UiPath community!

In Linux, I use ghostscript to decrease sizes of Pdf’s and in Windows ghostscript is available too.

  1. Ghostscript for windows : Ghostscript Downloads

  2. Then you can use the /ebook command from this link Compress PDF files with ghostscript · GitHub.

  3. This way you can use the Invoke PowerShell activity in UiPath and provide the input and output file paths to the script and the file should be reduced to a smaller size.

There are some limitations to using Ghostscript. You would want to read them from here : High Level Output Devices (ghostscript.com)

1 Like

Hi @jeevith ,

Thanks for the reply. Like I have downloaded the ghostscript from the link which you have provided and I have installed in my system and also I have used command by replacing with the /ebook in the powershell.

But, I am facing a error stating that,

My command is : "ghostscript -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf"

And also, the pdf files, like where should I pass the path of the files in this command?

Can you help me out why is this issue being raised? @jeevith

Hi @Murli_Manohar,

The error means that PowerShell does not have a command like the one you use, this is because UiPaths invoke powershell setting should be checked - IsScript

Point 1: In the invoke powershell activity, on the properties panel you can to choose IsScript box.

Point 2: Is your input.pdf the correct path of the file?

Hi @jeevith ,
I have not worked before in this activity, can you share a workflow of this so it would be helpful.

Hi @Murli_Manohar,

I dont have a solution for you now, but the way to use Invoke Powershell can be seen in the workflow I have attached in this thread: Need help opening jupyter notebook from uipath - Help / Activities - UiPath Community Forum

You will see that the script text is a little different but just replace the contents in it. Before also running all of this in UiPath it is worthwhile to check if the command you use runs on PowerShell itself. That way you know you are closer to the solution.

You can modify the variables to match your use-case.

Hi @Murli_Manohar,

After some 2 hours of trying to get the string formatting right for the script, I have a working example for you and others who are interested in reducing size of PDFs

External Dependency : Ghostscript for windows (I used gs9.54.0 version)

In PowerShell this syntax can be used:

Start-Process "C:\Program Files\gs\gs9.54.0\bin\gswin64c.exe" "-sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH -sOutputFile=in_OutputPDFFullPath in_InputPDFFullPath" -Wait

In UiPath this is not that straight forward due to string formatting requirements.
The workflow will take the following arguments including the PDFSettings option.


I have annotations for them in the workflow, which you can refer. The important argument being the location of GhostScript (gswin64c.exe), yours may differ. Do change it before you run the attached workflow.

Since we are sending strings to PowerShell and in script format we check the IsScript

I added some FileInfo details of the input and output files, which you can use to verify if the reduction was successful or not. The output would look like
image

image

I have not included any exception handling in the PowerShell script or in the workflow, but I am assuming you can include them. I have a write up on creating error-proof workflow which you can refer.

Here is a sample workflow:
ReducePDFSize_Ghostscript.zip (248.7 KB)

I really hope you and others get good use of this workflow :slight_smile: I sure did have some challenge getting this to work!

4 Likes

Hi @jeevith ,

Thank you very much for the support. I solved the problem where 10Mb file reduces to 2Mb. Once, again thank you very much.

Hi @jeevith ,

I have a question, like in the argument section you have used in_PDFSetting. What is the need of this variable and also default value is “screen”. So, can you explain what’s the need and use of that argument.

Hi @Murli_Manohar,

Ghostscript provides the option so that the user can choose the quality level of the ouput file.


High Level Output Devices (ghostscript.com)

It is upto you to choose which one suits your use-case best. Lets say if your PDF is further used by other robot / human processes, you may want a higher quality.

Examples:

  1. If you use the output file with OCR engines the results of the OCR would be better with printer or prepress options than screen as they retain better quality (contrast in this case)

  2. If the PDF will just be archived, then use screen this way you have good enough file quality and will save space in the long-run.

Hi @jeevith ,

Also, I have issue like if my file size is less than 1Mb, the output file is generated with blank white page, why is that?

Hi @Murli_Manohar,

That is strange. I used a PDF for lesser than 1mb and it does work. If this happens, I think you need to ensure you read the PDF content and do some exception handling so that the robot does not conclude the job was successfull.

Are you sure the input file was correctly set or has content in it?

The command goes through each page and processes each page, so I am not sure why this behaviour occurs. My guess is that PDF contains images and not rich text. Just a wild guess.

Hi @jeevith ,

Yes, if my file size is around 1.6Mb something like that, it gives me the required output without any issues. But, if my file size is about 91kb or 313kb or if the file contains a single page alone means I am getting an output file with blank page, even I have tried to change the settings of the Ghostscript yet I am getting the same issue but if my file size is more than 1Mb only I am able to get the output. Also, in my pdf files, the pages are scanned images yet if it’s more than 1Mb means it get process else facing with blank page.

I have also attached the screen shot of my issue.

Hi @Murli_Manohar,

I guess this issue you need explore on your own. The forum can only help you with way to get to your solution and cannot offer a 100% working solution (atleast not for complex automations).

  1. Your use-case in the question was for PDFs sizes over 5MB right? Why dont you filter out PDFs under 5 MB first, that way you do not need to reduce their file size. You will save robot execution time and avoid this failure altogether.

  2. Also adding a sequence to check if the output.pdf has no characters will be necessary if you continue to use the suggested approach on smaller file sizes.

As per the reason for blank pdfs, there are bunch of similar questions in stackoverflow / ghostscript forum dating from 2008 to 2019, which you can study.

Hi @jeevith ,

I’m facing an issue like if I try to invoke this workflow for multiple files through for loop where the input files is present in different locations and the output file need to be generated in different location the output file is not generated and also if both location in the same location means it works fine.

Can you help me out to resolve this issue.

Hi @jeevith ,

Waiting for your reply sir, when I tried to loop through files, I am not getting the desired output, tried to sort out in many ways yet have not resolved it.

Kindlly, please look into my issue.

@Murli_Manohar,

As far as I know, both input and output file paths can be anywhere.

I had also checked if the code runs when the file paths have space in them.

I would ask you check the command generated.
Copy the string pasted in invoke powershell activity and use message box / log message while debugging. Inspect the command and try running that in powershell. If it does not work then you know why it does not work.

So, debug in for loop with a message box /log message of the command string.

Hi @jeevith ,

Kindly, check the screenshot I have provided.

  1. The sample file which you have given in the workflow.

and the sampleout file it is below

This one I tried through for loop by getting the path of the folder where I take only pdf and do the further process.

2.The sample input file but without using loop, I am directly giving the file name and output in different location, I am able to get the desired output.

Likewise, if I tried for multiple pdf files, like the first image, I am getting the output with blank page or sometimes, the output itself not getting generated.

I have also tried using the Log message, and able to find that this log message “Failure : The size reduction seem to have not worked as required.” is only getting printed in the output panel.

Kindly, check it out whether there is any idea why I am getting this issue!

Thanks in advance,
@Murli_Manohar

Hi @Murli_Manohar,

What we know from your images:

  1. It works on sample pdf without for loop and
  2. It works on files smaller than 1000 kb for example, sample pdf
  3. It fails when invoking in for loop

Logically, the error lies somewhere in the for loop.
Please upload a image of the for loop where you assign your output file full path to the script.

If you are invoking the workflow, show the invoke workflow arguments and the variables you are passing to it from the parent workflow.

Hi @jeevith ,

Here are my workflow screenshots.
Image - 1

Image - 2

Image - 3

Image - 4

I have even tried of looping the files which are more than 5Mb into the loop

The process of the workflow:

  1. Getting the file directory
  2. Assigning maximum file value
  3. Iterating through the directory
    4.Checking whether file size is less than maximum file size
    5.if false, then the file size would be more than 5Mb so I need to compress the size of the file.
    5.1 the invoke powershell runs

But, finally I am getting the log message : “Failure : The size reduction seem to have not worked as required.”
and I have not got the output file generated since it throws an error.

Kindly, check it there is any mistake in the logic.

Thanks in advance,
@Murli_Manohar

Hi @Murli_Manohar,

Refering to your Image 1: What is the right hand side of this assign? I am suspecting the issue may very well be here as this is later used as in_OutputFileFullPath and OutputFile variables.

image

Other logic looks ok to me.

1 Like