I am working on a use case where I have to translate documents from various languages like(Chineses, Japanese, Thai etc etc) into English and process the documents.

I tried using Document Understanding and AI Center but it failed miserably. Can any one please provide any guidance.

Additionally, will using third party products like be usefull?


Could you let us know what were the different ML Packages or methods tried ?

  1. We tried with Huggingface, however it expects training first on the localized language and then extraction and translation would happen.
  2. We used different OCR for digitization but its converting Chinese into junk characters.


Did you try using ocr and provide the language that you want the document would be in?

Tessaract you can peovide the language as well


Could you also maybe let us know what are the document types that you would be receiving as input as part of this process ?

Is the Data Structured or unstructured or semi-structured ?

Also, what are the data points that are needed to be extracted from the Documents.