How to extract excel and pdf files available as an attachment from another pdf file? Another similar problem statement is to extract embedded pdf file within a doc file

I have and Document file which has an pdf file embedded in it, I want to extract it, is there a way to do it.
I also have another PDF document which as excel file attached to it, is there a way to extract it.

Hi

Pls share the manual steps that you follow to download them manually so that we can replicate the same with UiPath activities

Cheers @amarjeet.kumar

So manually when I open the PDF file , I see the attachment files listed on leftpane in Adobereader under attachments section, to save the attachments I right click on them and then click on save attachment option.
I know that this can be achieve using the click activity, however I am looking for any function/formula based solution

For now I couldn’t find any specific activity or a custom one to perform this action
And this can be done easily with UI based activities like this

  1. Starts process activity where pass the filepath as input which will open the pdf file up front in screen
  2. Then use a CLICK activity where we can change the mouse button option to Click Right
  3. Then again a click activity and a type into activity to enter the filepath where it has to save the file

Cheers @amarjeet.kumar

Do you have experience in C# coding?

I’m found a method that requires using C#, almost done but need some support with it.

If anyone else has any experience, that could solve this issue.

I have found this code on stackoverflow:
(c# - Reading PDF File Attachment Annotations with iTextSharp - Stack Overflow)

/**
 * Extracts document level attachments
 * @param PDF from which document level attachments will be extracted
 * @param zip the ZipFile object to add the extracted images
 */
public void ExtractDocLevelAttachments(byte[] pdf, ZipFile zip) {
  PdfReader reader = new PdfReader(pdf);
  PdfDictionary root = reader.Catalog;
  PdfDictionary documentnames = root.GetAsDict(PdfName.NAMES);
  PdfDictionary embeddedfiles = 
      documentnames.GetAsDict(PdfName.EMBEDDEDFILES);
  PdfArray filespecs = embeddedfiles.GetAsArray(PdfName.NAMES);
  for (int i = 0; i < filespecs.Size; ) {
    filespecs.GetAsString(i++);
    PdfDictionary filespec = filespecs.GetAsDict(i++);
    PdfDictionary refs = filespec.GetAsDict(PdfName.EF);
    foreach (PdfName key in refs.Keys) {
      PRStream stream = (PRStream) PdfReader.GetPdfObject(
        refs.GetAsIndirectObject(key)
      );
      zip.AddEntry(
        filespec.GetAsString(key).ToString(), 
        PdfReader.GetStreamBytes(stream)
      );
    }
  }
}

I adjusted it to this inside Uipath:

iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(pdf);
iTextSharp.text.pdf.PdfDictionary root = reader.Catalog;
iTextSharp.text.pdf.PdfDictionary documentnames = root.GetAsDict(iTextSharp.text.pdf.PdfName.NAMES);

iTextSharp.text.pdf.PdfDictionary embeddedfiles = documentnames.GetAsDict(iTextSharp.text.pdf.PdfName.EMBEDDEDFILE);
iTextSharp.text.pdf.PdfArray filespecs = embeddedfiles.GetAsArray(iTextSharp.text.pdf.PdfName.NAMES);

for (int i = 0; i < filespecs.Size; ) {
    filespecs.GetAsString(i++);
    iTextSharp.text.pdf.PdfDictionary filespec = filespecs.GetAsDict(i++);
    iTextSharp.text.pdf.PdfDictionary refs = filespec.GetAsDict(iTextSharp.text.pdf.PdfName.EF);
    foreach (iTextSharp.text.pdf.PdfName key in refs.Keys) {
      iTextSharp.text.pdf.PRStream stream = (iTextSharp.text.pdf.PRStream) iTextSharp.text.pdf.PdfReader.GetPdfObject(
        refs.GetAsIndirectObject(key)
      );
      zip.Write(
		iTextSharp.text.pdf.PdfReader.GetStreamBytes(stream),
		0,
		100000);
	}
  }

(I also installed iText nuget package and imported its namespace.)
I’m getting no compiler error. But I still can’t find the type of variables that should be assigned to the arguments (pdf and zip) and how to create them in Uipath.

Main.xaml (13.1 KB)

Best,
Charbel

Thanks Charbel, but I dont have any experience with C#

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.