Read two OCR languages at once

Hi all,

Currently I am trying to read a pdf using intelligent OCR. The problem is the document contains both Chinese and English. I was wondering if anybody has a solution to be able to read this whole document in one go, thanks!

You would need to read the file in two passes to get the data. There is not an activity to do both simultaneously.

Hi @kahoyim

First, you can check on your taxonomy file which is located in DocummentProcessing\taxonomy.json if you have referenced the two supported languages.

image

After that, I would recommend to specify in your OCR engine that you are trying to recognize two languages, you can define that in your Properties panel on Language field.

Here you have an example about how to add more supported languages to Tesseract OCR engine

1 Like

Hi Andres,

Thanks for your reply, just wondering how can I add two languages in the properties panel? It keeps giving me an error

Thanks

Did you manage to resolve the issue of handling two languages?