Non pretrained languages

I am wondering for Document understanding. For languages other than pre-trained ones
Is it hard to train? I have being looking through the documentation and this forum and am not sure where to start.


You only need to label them and show what the data looks like so that ml can identify the characters you need…

Datalabeling for a good number of dataset is recommended (atleast 10 documents per type)