UiPath Document OCR Recognize Line Breaks

tina.zhao · March 14, 2024, 5:57pm

I used UiPath Document OCR for reading text from pdf files. The result is pretty accurate, but there is no line break in the output. All text are combined in a big chuck of text. How can I make sure the OCR also capture line breaks?

I have tried OmniPage OCR, screen OCR, Tesseract OCR, either the result is not accurate enough or doesn’t read line breaks.

I have also tried use “Split Text” afterwards, but it couldn’t be split using the separator “new line”.

Garret · March 14, 2024, 8:28pm

Is your end goal to extract data from the document or to provide the document to an end user?

tina.zhao · March 14, 2024, 8:29pm

My end goal is to provide a text file of extracted text from the pdf. I have obtained the text file, would like the text file to follow the line breaks in the pdf

Garret · March 14, 2024, 8:40pm

Someone may know more than me, but from my experience, I haven’t had much luck preserving format with OCR.

If this document you are getting is standard, then you could either use text manipulation (i.e left,right,instr) or regex to extract the data and then rebuild the file.

If this is an option and you aren’t familiar with these techniques, I would be happy to help.

pramod_kumar2 · March 15, 2024, 7:18am

could you try with UipathOCR, but additional requirement for this OCR is we need API key to work.

Topic		Replies	Views
Using UIPath OCR , bubt its giving me result in two line only Documentation docs , question	7	551	October 25, 2023
UiPath Intelligence Help ocr , studio	3	1783	November 15, 2018
Text Extraction for PDF File Studio	4	1638	July 16, 2020
OCR for different files Help	1	1398	May 22, 2018
UiPath OCR questions Studio	5	1257	October 26, 2022

UiPath Document OCR Recognize Line Breaks

Related topics