PDF File Reading

Hi Team,

I am trying to read 3 PDF files using - Read PDF Text activity. But for one specific file sometimes it’s unable to read the contents getting error as file corrupted.

UiPath.PDF.Activities - v3.19.1 in spite of upgrading the version its the same
If i use Read PDF with OCR activity success rate is better still it’s not 100%

What could be the reason? Any other alternate activities are there ?

@Balan

  1. The format or encoding of pdf
  2. If pdf is scanned it might fail
  3. with ocr try different ocrs and check

cheers

@Balan,

OCR activity success rate mainly depends on the quality of the pdf.

Alternative solution would be Document understanding which can use AI integration to get get more accurate data.

1 Like

Hello @Balan,

You should add a condition to check if the string is null or empty. If it is, consider using an OCR activity for that specific PDF or image.

This is important because the conventional activities that attempt to convert PDFs to text rely on the presence of native text within the PDF. If the PDF lacks native text, it cannot extract any information. In such cases, OCR can be beneficial as it can extract text from both native and non-native PDFs or images.

Advantages of Using OCR:

  1. Good Format Output: Depending on the OCR engine you use, you can achieve well-formatted text output. Google Vision OCR and UiPath Document OCR are particularly good at maintaining format integrity.
  2. Enhanced Data Extraction: You can leverage Document Understanding Activities to better extract data from the output text, which is a valuable addition.

You can use the following OCR engines:

  1. UiPath Document OCR
  2. Google Vision OCR (note that you’ll need an API key for this option).

//This are not the only available option when come to ocr engines but this are among the best one you can get in UiPath

Make sure to implement this condition syntax:

String.IsNullOrEmpty("yourString")