Read two OCR languages at once

kahoyim · April 3, 2020, 6:59am

Hi all,

Currently I am trying to read a pdf using intelligent OCR. The problem is the document contains both Chinese and English. I was wondering if anybody has a solution to be able to read this whole document in one go, thanks!

Anthony_Humphries · April 3, 2020, 11:26am

You would need to read the file in two passes to get the data. There is not an activity to do both simultaneously.

AndresTarazona · April 3, 2020, 2:23pm

Hi @kahoyim

First, you can check on your taxonomy file which is located in DocummentProcessing\taxonomy.json if you have referenced the two supported languages.

After that, I would recommend to specify in your OCR engine that you are trying to recognize two languages, you can define that in your Properties panel on Language field.

Here you have an example about how to add more supported languages to Tesseract OCR engine

kahoyim · April 6, 2020, 6:32am

Hi Andres,

Thanks for your reply, just wondering how can I add two languages in the properties panel? It keeps giving me an error

Thanks

hgaber105 · January 31, 2024, 5:42pm

Did you manage to resolve the issue of handling two languages?

Topic		Replies	Views
How to add Tesseract OCR 2 language? Studio ocr , studio , question , document_understanding	3	821	January 31, 2024
How to set two languages at the same time for Tesseract OCR engine? Activities ocr , studio , question	2	830	May 12, 2023
How to use multiple languages at once in OCR? Help ocr , activities , question	2	2729	December 27, 2019
Adding another language to OCR Help	2	2608	March 12, 2018
How to set up Google OCR for Portuguese Language Help ocr , studio	6	6872	January 2, 2020

Read two OCR languages at once

Related topics