UiPath Document OCR Recognize Line Breaks

I used UiPath Document OCR for reading text from pdf files. The result is pretty accurate, but there is no line break in the output. All text are combined in a big chuck of text. How can I make sure the OCR also capture line breaks?

I have tried OmniPage OCR, screen OCR, Tesseract OCR, either the result is not accurate enough or doesn’t read line breaks.

I have also tried use “Split Text” afterwards, but it couldn’t be split using the separator “new line”.


Is your end goal to extract data from the document or to provide the document to an end user?

My end goal is to provide a text file of extracted text from the pdf. I have obtained the text file, would like the text file to follow the line breaks in the pdf

Someone may know more than me, but from my experience, I haven’t had much luck preserving format with OCR.

If this document you are getting is standard, then you could either use text manipulation (i.e left,right,instr) or regex to extract the data and then rebuild the file.

If this is an option and you aren’t familiar with these techniques, I would be happy to help.

could you try with UipathOCR, but additional requirement for this OCR is we need API key to work.