can you share workflow, I have done a project for a less popular language using Document Understanding with form Extractor, outcome was quite decent.
Form Extractor will focus on extracting using torn layout and position specified in training. It will definitely work as the language is challenging to decide.
As per my experience, if you go around finding the OCR engines, Google OCR and Tesseract will give you some outcome but very poor accuracy.
This is related to the limitations of OCR extraction, how clear the input file is. You can try with Omni OCR that should capture this well. I tried for Arabic and Mandarin, it had worked well.
I mentioned, form Extractor is extracting data as per predefined elements. You can give a try to get those values in some container.
If you can confirm that the structure will remain exactly same for this document then you may use regex ans split using new line variable.
As ML Extractor will require some serious training and will definitely have low accuracy (as per my experience with foreign language documents) so go for it, if you don’t get something from above two Extractor.