Local language in date format

Hello,

What are your experiences with having local language names in date fields? We have a model/models that is multi-language (German/Dutch/Swedish/English/Italian), and of course based on the theory we have provided enough examples for each of those languages and templates so that the model worked correctly and it does. The only issue that I’m trying to grasp is fields that are not of string type, such as dates. Up until recently the only time words appeared on fields labeled as date-type, where English ones, such as

11 Jun 2024 - extracted as 2024-06-11.

But recently we had situation with

11 giu 2024 (giugno - Italian for June) → and sometimes it extracted it correctly and other times not, which doesn’t give me a definitive answer what to do to increase the extraction percentage - is it going to be enough with increasing the amount of samples or should I create a separate field for Italian samples, treat it as string and adjust in post processing?

Any help appreciated here, thanks!

@salladinne

  1. First check if the resolution of pdf is proper
  2. try to manually extract data using read pdf with ocr and see if it is coming properly…
  3. If you have different types of formats then its better you train with more samples

cheers