Handling multiple date formats after OCR

Dear Community,
We’re using OCR to process paper transactions from multiple sources. One of the fields we are trying to find is the date the transaction happened. Depending on the source, the date can take many forms. For example, April 17th, 2019 can come up in the following way:

190417
2019-04-17
2019-04-17 6:00
APR 17 2019
17 AVR 2019 (French version)
170419

Any idea how I can process this without creating an elaborate sequence with extensive string manipulation? Thanks!

Have a nice day,
Phil

May i know what you would to process with these date value
Cheers @Phil

Hi Palaniyappan,
Very good question!

I need to standardise them all to be in the 2019-04-17 format. Thanks!

Have a nice day,
Phil

@Phil

Four of these are easily solvable, two are not.

Your first step should be to remove the spaces and replace them with periods. This is necessary because in the case of the French DateTime, the month must have a period to be parsed. As far as .NET is concerned, “avr.” is a valid month and “avr” is not.

So, if your date is in a string called my_datetime_string, then the following code will store it as a DateTime:
my_datetime = DateTime.Parse(my_datetime_string.Replace(" ",".")

This approach will work for:
2019-04-17
2019-04-17 6:00
APR 17 2019

If you try this and it fails then perhaps the date is in French. To parse the date in french (17 AVR 2019) you need to add the culture info parameter:
my_datetime = DateTime.Parse(stringVar,new system.Globalization.CultureInfo("fr-FR"))

To print either of these DateTime objects as “2019-04-07” you can do my_datetime.ToString("yyyy-MM-dd").

As for the other two formats, you may need to do some manual processing. The computer has no way to automatically determine that “190417” and “170419” are the same date, since either could mean April 17, 2019 or April 19, 2017. If you know the position of the data you can use DateTime.ParseExact to tell the computer which data to take as the day/month/year:
DateTime.ParseExact(stringVar,"yyMMdd",new system.Globalization.CultureInfo("en-US")).ToString("yyyy-MM-dd") will parse “190417” as April 17, 2019 and “170419” as April 19. 2017.

2 Likes

Thanks, Daniel, this is amazing!

The transactions can’t be more than 90 days old (if they are, they need to go through another process that won’t be automated). Is there a way I could anchor the comp so it would pick whichever of the two models that would give a date within a 90 day timeframe? Thanks!

Have a nice day,
Phil

@Phil
Once you have the DateTime saved in a variable you can compare it to the current date to see if it is too old.You could use an If with the condition set to DateTime.Today.AddDays(-90) > my_datetime which would be True if the scraped date is older than 90 days ago and False otherwise.

1 Like

This works remarkably well! Thank you so much for the help!

Have a nice day,
Phil

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.