Read PDF With OCR (Tesseract OCR)

Hi everyone,
I got a problem, which is when I read pdf file using tesseract OCR and get number but that’s not same with on pdf’s one.
Is there any solutions?

Regards,
Temuka

Specially doesn’t understand “8” or “9”

Try UiPath document ocr so that you may get the required value

1 Like

Hey,
Do I only have to write str variable on text from properties?

I think it’s not working.

Can you please share the sample data

Sorry, I can’t upload some reason.
And I already solved it.
Thank you!

@Temuulen_Buyangerel

Please change the scale of a teseract ocr in properties panel start from 1 and gradually increase by 0.5

I will try it
Thank you!
Regards

Have you tried Omni page OCR ?

Hey,
No I haven’t tried.
Still can’t get that exact number I want.

Hi @Temuulen_Buyangerel ,
Can you share your file ?
I will try read it
Regards,
LNV

Hey,
I can’t upload it, sorry.

Oh, I see
you can try read text PDF, because OCR sometime can not understand numeric
hope it help

1 Like

Hi,
You can see 2 way to read PDF
My file
image
Sample.pdf (805.1 KB)

  1. by OCR (Tesseract OCR)

    result

    It’s not good and too slow

2.by read PDF text


result

It’s better, faster
You can try
pdf _excel.xaml (15.7 KB)
fr.xlsx (11.8 KB)

Hope it help
Regards,
LNV

1 Like

Thank you!
It works.
Regards,
Temuulen

Sometimes, simple is better than complicated :smile:
Cheer

There were some texts not in English so, I was using Tesseract and traineddata of other language.
Now I used other read pdf activity to get numeric.
Thanks for solution!

1 Like

Yes, some pictographic languages like china, japan,… OCR is better
happy automation

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.