I am trying to read 3 PDF files using - Read PDF Text activity. But for one specific file sometimes it’s unable to read the contents getting error as file corrupted.
UiPath.PDF.Activities - v3.19.1 in spite of upgrading the version its the same
If i use Read PDF with OCR activity success rate is better still it’s not 100%
What could be the reason? Any other alternate activities are there ?
You should add a condition to check if the string is null or empty. If it is, consider using an OCR activity for that specific PDF or image.
This is important because the conventional activities that attempt to convert PDFs to text rely on the presence of native text within the PDF. If the PDF lacks native text, it cannot extract any information. In such cases, OCR can be beneficial as it can extract text from both native and non-native PDFs or images.
Advantages of Using OCR:
Good Format Output: Depending on the OCR engine you use, you can achieve well-formatted text output. Google Vision OCR and UiPath Document OCR are particularly good at maintaining format integrity.
Enhanced Data Extraction: You can leverage Document Understanding Activities to better extract data from the output text, which is a valuable addition.
You can use the following OCR engines:
UiPath Document OCR
Google Vision OCR (note that you’ll need an API key for this option).
//This are not the only available option when come to ocr engines but this are among the best one you can get in UiPath