PDF File Reading

Balan · September 27, 2024, 11:03am

Hi Team,

I am trying to read 3 PDF files using - Read PDF Text activity. But for one specific file sometimes it’s unable to read the contents getting error as file corrupted.

UiPath.PDF.Activities - v3.19.1 in spite of upgrading the version its the same
If i use Read PDF with OCR activity success rate is better still it’s not 100%

What could be the reason? Any other alternate activities are there ?

Anil_G · September 27, 2024, 12:54pm

@Balan

The format or encoding of pdf
If pdf is scanned it might fail
with ocr try different ocrs and check

cheers

ashokkarale · September 27, 2024, 12:58pm

@Balan,

OCR activity success rate mainly depends on the quality of the pdf.

Alternative solution would be Document understanding which can use AI integration to get get more accurate data.

indiedev91 · September 27, 2024, 1:14pm

Hello @Balan,

You should add a condition to check if the string is null or empty. If it is, consider using an OCR activity for that specific PDF or image.

This is important because the conventional activities that attempt to convert PDFs to text rely on the presence of native text within the PDF. If the PDF lacks native text, it cannot extract any information. In such cases, OCR can be beneficial as it can extract text from both native and non-native PDFs or images.

Advantages of Using OCR:

Good Format Output: Depending on the OCR engine you use, you can achieve well-formatted text output. Google Vision OCR and UiPath Document OCR are particularly good at maintaining format integrity.
Enhanced Data Extraction: You can leverage Document Understanding Activities to better extract data from the output text, which is a valuable addition.

You can use the following OCR engines:

UiPath Document OCR
Google Vision OCR (note that you’ll need an API key for this option).

//This are not the only available option when come to ocr engines but this are among the best one you can get in UiPath

Make sure to implement this condition syntax:

String.IsNullOrEmpty("yourString")

Topic		Replies	Views
Unable to read pdf using Read pdf text Activity Something Else feedback	10	2334	February 10, 2022
Read PDF Question Activities pdf , activities , question	3	347	July 21, 2023
Is there a reliable ocr activity to scan pdfs? Activities pdf , activities , question	5	74	November 1, 2024
Read PDF Text Activity should also return structured text Activities activities , considering	12	4096	January 29, 2020
Problem reading PDF Help pdf , activities , question	3	783	December 30, 2020

PDF File Reading

Related topics