Ocr is not working with japanese language

ocr
studio

#1

Hi,
I’m not able to get the data using Google ocr (Japanese language) From Scanned pdf file.
I’m getting the following error.


Fail to get OCR japanese text in Citrix environment
#2

did you try with Microsoft from OCR engine option?


#3

yes. its not working


#4

finally I got solution.


#5

here the language pack. This one working

https://github.com/tesseract-ocr/tessdata/blob/bf82613055ebc6e63d9e3b438a5c234bfd638c93/jpn.traineddata


#6

Thank you @suresh_polinati
I tried jpn.traineddata and can fixed same issue.
But it seems there are many incorrect texts.
eg. 100万円 ==> 1。。円


#7

OCR isn’t perfect. Try scale option or Microsoft OCR.


#8

Using Microsoft Ocr is not I’m Not able to read Japanese data.


#9

Hi.
Language Pack might be the solution. Hope this helps :slight_smile:

https://social.msdn.microsoft.com/Forums/en-US/a9c81d92-b044-485a-8cc9-59a5b9862236/ocr-method-of-modi?forum=vblanguage
https://social.technet.microsoft.com/Forums/exchange/en-US/727bf275-2bdd-4e5a-805d-a91fc93aa263/japanese-language-ocr-in-modi?forum=excel