Can we read pdf document which contains scanned images? Pdf will have scanned image as the last page attachment and other pages will be native text.
Hi @Sachin_001 ,
You will be able to use the PDF activities to extract native text and for non native text, you can use the Apply OCR option checked.
If you just want to extract images from pdf, use the below activity
-
Use “Read PDF with OCR”:
- PDFPath: “path/to/your/pdf/file.pdf”
- OCR Engine: Tesseract OCR
-
Use “Read PDF Text”:
- PDFPath: “path/to/your/pdf/file.pdf”
- Page Range: “1” (or the range of pages containing native text)
-
Use “Read PDF Text” with OCR for the last page:
- PDFPath: “path/to/your/pdf/file.pdf”
- Page Range: “Last” (or the page number of the last page)
- OCR Engine: Tesseract OCR
Yes, you can read a PDF document that contains scanned images in UiPath. However, extracting text from scanned images (images that are not text-selectable) typically involves OCR (Optical Character Recognition) technology. UiPath provides activities to work with OCR engines for extracting text from images.
You Have to use Read pdf Text With Ocr Use Teseract Ocr Activity .Teseract Ocr is the Best Ocr It is a free Ocr .
If You have any queries related to ocr i will be free to answer your queries
Thank you
Can we measure the quality of scanned documents is GOOD or BAD.
As we can get less sized logistic scanned documents for extraction and make report and entry.