Read PDF with OCR not extracting the first few lines

ayyagari.sudhakar · May 11, 2020, 12:53pm

Hi,

I am trying to read pdf thorugh OCR and write it to a text file. The pdf has both searchable text and text in images. When I use the ‘Read PDF with OCR’ activity, it is ignoring the first line which is the header of the PDF document. In the output, I can see text from second line onwards. Please note that both the first line and second line are searchable text in the pdf.

I tried ‘Read PDF text’ activity and see similar results. I will not be able to share the pdf.

Has anyone faced a similar issue earlier?

Thanks

j.shuller · June 28, 2023, 12:47pm

I also have a similar problem. Read PDF With OCR in C# is returning a single line when using the UiPath Document OCR activity. Tesseract OCR returns multiple lines, but it’s accuracy doesn’t look so good.

Any advice would be appreciated.

Topic		Replies	Views
'Read PDF With OCR' is not reading line by line from PDF Studio uiautomation	6	92	November 13, 2024
Not able to extract some data from scanned PDF using OCR Studio studio , question , activities_panel	1	780	July 27, 2022
Read PDF with OCR only reads couple of pages or last page Activities pdf , activities , question	6	2323	July 29, 2021
Using UIPath OCR , bubt its giving me result in two line only Documentation docs , question	7	566	October 25, 2023
Read PDFwith OCR is not extracting all the pages using Tesseract OCR Engine Help activities	5	2096	June 29, 2019

Read PDF with OCR not extracting the first few lines

Related topics