Read PDF content and create folder,download attachment and store it in the created folder


I have a task to read the content of PDF like company name/Client name/Project name and Date and time of email combining all i have to create a folder and name it.

eg: 01-02-2020 10:23 Companyname_ABCclient_AbcProject(this should be folder name)

In the above folder i have to create two separate folder UNION and NON UNION where the PDF’s will be stored (UNION and NON UNION these keywords will also be captured from PDF and their respective UNION and NON UNION files will be stored).

Ie: Union PDF will go in Union folder and Non union PDF will be stored in non union Folder


1 Like

Hi @heena_shaikh

You can try using Read Pdf and if this is plain text the out put to a notepad and you can check the output. If this is triggering fine then you can save to variables by performing Regex / String operations, You can use Create Folder and pass those variables

If it is a image PDF file you have use Read Pdf with OCR and from the output you can do the same operations as above

Hope this helps


1 Like

Hope these steps would help you resolve this
—first get the mail list from outlook or any email provider with mail activities and output variable of type list of mailmessages
—pass that to for each activity as input and change the type argument as System.Net.Mail.MailMessage

—And inside the loop use a SAVE ATTACHMENT activity and save the files to a folder mentioned
—now after this for each loop use a Assign activity like this
arr_filepath = Directory.GetFiles(“yourfolderpath”,”*.pdf”)

Where arr_filepath is a variable of type Array of string

—then use a FOR EACH LOOP and pass the above variable as input
—Inside the loop use READ PDF activity and get the output with a variable of type string named stroutput

—now use STRING manipulation to get the value of each term we want and assign them to variables
—then use CREATE DIRECTORY activity where we can use those variables to create a folder structure we want

Cheers @heena_shaikh

1 Like

@Palaniyappan some how i am not able to use proper string manipulation can you please help me out with the same.

even i have scanned PDF and some time the PDF are handwritten and scanned so can i read various types of PDF all at once to extract data ie: handwritten and typed.


Oh you mean the input is not standard
Is it so buddy


Yes its all mix and sometimes handwritten too

@Palaniyappan Can you please help with some ideas because it is still pending and i am not able to do.and i even want to know wheter it is practically possible to achieve.

Downloading different types of PDF and capturing specific data from (Scanned Typed,Hand Written,PDF data is not at fixed location).

And the PDF’s of different types can be there in one mail.