One of the steps in a new business proces we are about the automate is the conversion of a non-searchable pdf to a searchable one. Is there a way to do this without the use of any third party application (like Adobe Acrobat Reader DC)? My first gues was to use the OCR activity but this gives back a string, which I cannot export to a PDF. We alreay experimented with Acrobat DC but this is not the finest application to use in combination with UiPath (same issues as already desribed on this forum too).
To make your pdf searchable using uipath, please follow the below steps:
Read pdf with OCR
Save extracted data from this activity.
Use invoke code activity.
Write below c# code to place extracted data from scanned pdf into pdf’s “Keywords” section. Once done, this will make the pdf searchable using the keywords present in pdf’s “keywords” section.
var doc = new Document();
string path = “”;
PdfReader reader = new PdfReader(path+“”);
PdfStamper stamper = new PdfStamper(reader, new FileStream(path+“”, FileMode.Create));
var info = reader.Info;
info[“Keywords”] =pdfText; where pdfText is the variable that holds the data extracted using step1
stamper.MoreInfo = info;
stamper.FormFlattening = true;
insertedWordCount = info[“Keywords”].Length;
Also, you will need to import namespace - iTextSharp.text.pdf and iTextSharp.text.xml.xmp