Not able to extract some data from scanned PDF using OCR

Hi All,

i am trying to extract some data from scanned PDF but got below error.

error: Read PDF With OCR: Could not find file ‘C:\Users.…\Documents\UiPath\FirstAutomation_OCR\C’.

what i have done is:

  • made a config file and calling the file path (PDF file Path)from there.
  • looping through the each pdf files using for each activity.
  • used read pdf text with OCR activity and used Tesseract OCR and microsoft OCR to read the text but not getting the desired output.

note- since some PDF files is more than 6 pages. so i am trying to read only 1 page.

kindly suggest some solutions.

Happy automation

Hey!

Before reading the pdf check whether the pdf file exists in the folder or not

Like this:

Assign strFolderPath = "C:\Users\Name\Documents\UiPath\FirstAutomation_OCR\"
Assign ArrFiles = Directory.GetFiles(strFolderPath,"*.pdf")

Take one for each pass the ArrFiles

You’ll get the all pdf files one by one…

Inside the for each take one Read pdf activity and pass the path as item - Output as - strPdfOutput

Now use string manipulation to get the desired output

Regards,
NaNi