Read PDF With OCR (Tesseract OCR)

Temuulen_Buyangerel · August 10, 2023, 10:12am

Hi everyone,
I got a problem, which is when I read pdf file using tesseract OCR and get number but that’s not same with on pdf’s one.
Is there any solutions?

Regards,
Temuka

Temuulen_Buyangerel · August 10, 2023, 10:13am

Specially doesn’t understand “8” or “9”

Usha_Jyothi · August 10, 2023, 10:16am

Try UiPath document ocr so that you may get the required value

Temuulen_Buyangerel · August 10, 2023, 10:20am

Hey,
Do I only have to write str variable on text from properties?

Temuulen_Buyangerel · August 10, 2023, 10:25am

I think it’s not working.

Usha_Jyothi · August 10, 2023, 10:32am

Can you please share the sample data

Temuulen_Buyangerel · August 10, 2023, 10:36am

Sorry, I can’t upload some reason.
And I already solved it.
Thank you!

rlgandu · August 10, 2023, 10:43am

@Temuulen_Buyangerel

Please change the scale of a teseract ocr in properties panel start from 1 and gradually increase by 0.5

Temuulen_Buyangerel · August 10, 2023, 10:50am

I will try it
Thank you!
Regards

Veera_Raj · August 10, 2023, 10:59am

Have you tried Omni page OCR ?

Temuulen_Buyangerel · August 11, 2023, 2:49am

Hey,
No I haven’t tried.
Still can’t get that exact number I want.

Nguyen_Van_Luong1 · August 11, 2023, 2:54am

Hi @Temuulen_Buyangerel ,
Can you share your file ?
I will try read it
Regards,
LNV

Temuulen_Buyangerel · August 11, 2023, 3:08am

Hey,
I can’t upload it, sorry.

Nguyen_Van_Luong1 · August 11, 2023, 3:16am

Oh, I see
you can try read text PDF, because OCR sometime can not understand numeric
hope it help

Nguyen_Van_Luong1 · August 11, 2023, 3:34am

Hi,
You can see 2 way to read PDF
My file

Sample.pdf (805.1 KB)

by OCR (Tesseract OCR)

image512×769 18.9 KB

result

image876×225 3.56 KB

It’s not good and too slow

2.by read PDF text

result

It’s better, faster
You can try
pdf _excel.xaml (15.7 KB)
fr.xlsx (11.8 KB)

Hope it help
Regards,
LNV

Temuulen_Buyangerel · August 11, 2023, 3:36am

Thank you!
It works.
Regards,
Temuulen

Nguyen_Van_Luong1 · August 11, 2023, 3:40am

Sometimes, simple is better than complicated
Cheer

Temuulen_Buyangerel · August 11, 2023, 3:48am

There were some texts not in English so, I was using Tesseract and traineddata of other language.
Now I used other read pdf activity to get numeric.
Thanks for solution!

Nguyen_Van_Luong1 · August 11, 2023, 3:52am

Yes, some pictographic languages like china, japan,… OCR is better
happy automation

system · August 14, 2023, 3:52am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
To make sharpen better of scanned pdf Studio studio , question , activities_panel	13	507	August 18, 2023
PDF OCR Problem in extracting a single numeric character Document Understanding ocr , feedback	1	1163	June 29, 2021
Not able to extract data from pdf Activities ocr , studio	5	983	October 19, 2022
Regex not working in READ PDF TEXT activity and Tesseract OCR reading numbers incorrectly in READ PDF WITH OCR in UiPath Studio Studio	5	148	May 29, 2024
Tesseract OCR でpdfが読み込めませんフォーラム activities , studio	4	1579	May 4, 2022

Read PDF With OCR (Tesseract OCR)

Related topics