We have currently implemented invoice processing project using ML and form extractor. While it has been revolutionary changes to the finance department, we are having some issues with parsing dates from PDF copies.
With the ML, it extracts the date into yyyy-mm-dd format in some instances, while in some instances it would extracts into yyyy-dd-mm format. For an example, if the invoice date printed on PDF Is 06/08/20 (dd/mm/yy), then it may read it as 8th Jun 2020 in some cases, and 6th Aug 2020 in other cases.
Same issue with the form extractor, where the value of year also being read incorrectly in some cases. i.e. 19/08/20 will be scanned as 20th Aug 2019.
We are using end points for ML and position based extractor.
Anyone know the potential solution for this problem? Is it possible to create new variable for the form extractor and read it as a string rather than taken as in the date format automatically?
Happy to share more information, if required.