PDF OCR Problem in extracting a single numeric character

dominador.nazareno · May 5, 2021, 7:43am

Hello guyz, newbie here.
I tried to extract data from PDF using OCR however, a single numeric character cannot properly read by OCR. Does anyone have same experience and normally what did you do to overcome this kind of challenging automation task?
So far activities used to read the data in PDF:
1. Native getting of text (Get Visible Text) → partially work on selected PDF
2. Read Text PDF → Cannot read. Returns an empty string of data.
3. Read Text PDF with OCR - some characters are translated into different characters.
> Microsoft OCR - Cannot translate correctly.
> Tesseract OCR - Cannot translate correctly.
4. Native Citrix → experience the same scenario.

Your input is highly appreciate.

Monica_Secelean · June 29, 2021, 11:44am

Hi Dominador,

Would you mind mind sharing with us the documents you try to extract data from? I would not be able to let you know what’s wrong based on the description only (as the use case should be supported) and would require more details in order to be able to investigate it.

Thanks,
Monica

Topic		Replies	Views
Read PDF With OCR (Tesseract OCR) Studio studio , question , activities_panel	19	1911	August 14, 2023
Different results reading a Native PDF File and Scanned PDF File with the same OCR Activities activities , question , document_understanding	2	1956	March 6, 2022
I am unable to read and extract data from pdf file Help	2	1465	March 31, 2018
Not able to extract data from pdf Activities ocr , studio	5	1038	October 19, 2022
Get ocr text not returning correct text Help	0	768	February 3, 2020

PDF OCR Problem in extracting a single numeric character

Related topics