Ocr is not working with japanese language

Hi,
I’m not able to get the data using Google ocr (Japanese language) From Scanned pdf file.
I’m getting the following error.

1 Like

did you try with Microsoft from OCR engine option?

yes. its not working

finally I got solution.

here the language pack. This one working

https://github.com/tesseract-ocr/tessdata/blob/bf82613055ebc6e63d9e3b438a5c234bfd638c93/jpn.traineddata

3 Likes

Thank you @suresh_polinati
I tried jpn.traineddata and can fixed same issue.
But it seems there are many incorrect texts.
eg. 100万円 ==> 1。。円

OCR isn’t perfect. Try scale option or Microsoft OCR.

Using Microsoft Ocr is not I’m Not able to read Japanese data.

Hi.
Language Pack might be the solution. Hope this helps :slight_smile:

https://social.msdn.microsoft.com/Forums/en-US/a9c81d92-b044-485a-8cc9-59a5b9862236/ocr-method-of-modi?forum=vblanguage
https://social.technet.microsoft.com/Forums/exchange/en-US/727bf275-2bdd-4e5a-805d-a91fc93aa263/japanese-language-ocr-in-modi?forum=excel