Identifying and extracting date from Word Doc

I have a requirement to extract some data from an approval letter in word document to an excel document. I was able to use Word Application Scope, read content and extract some data. But I am having a challenge in extracting a date.
Date Requirements:
May or may not be present
May be in any format – mm/dd/yy or Month DD, YYYY or DD Month YYYY etc.
May be anywhere above the body of the letter.

It would be great if anyone can provide any insights.

Hello @arunasan,

This might be a task for using Regex, from Matches activity.
You can create your specific pattern for dates.

An example, already for US dates, that pattern exist in RegEx builder:

I hope it helps.

Vasile.

1 Like

Hi @wasea

Thank you. I am trying your suggestion. I am getting this error when I run it.

Assign: Object reference not set to an instance of an object.

Could you please let me know if I am missing something?

hi @arunasan …Paste your text in any regex tool, I would prefer .NET Regex Tester - Regex Storm - (this works well with UiPath) and the regex pattern …and see if you are getting any result.

2 Likes

@wasea
Also, the date in one of the document is December 23, 2020
I am not sure if the US Dates pattern covers this format. Could you please clarify?

@prasath17
Thank you Prasath. I tried with the tester and it didnt result any match for December 23, 2020. I removed the comma and it returned one match.

Hello @arunasan,

In my previous example I only show what exist by default in UiPath Studio - Matches Activity.

For the date that you have there, the below pattern might work.
This dates can return:

  • December dd, yyyy
  • mm/dd/yy

January\s?\d+?,\s?\d{4}|February\s?\d+?,\s?\d{4}|March\s?\d+?,\s?\d{4}|April\s?\d+?,\s?\d{4}|May\s?\d+?,\s?\d{4}|June\s?\d+?,\s?\d{4}|July\s?\d+?,\s?\d{4}|August\s?\d+?,\s?\d{4}|September\s?\d+?,\s?\d{4}|October\s?\d+?,\s?\d{4}|November\s?\d+?,\s?\d{4}|December\s?\d+?,\s?\d{4}|\d{2}/\d{2}/\d{2}

You just need to play with the patterns in Matches activity or in sites like the one recommended by prasath17.

Also, for studying more Regex, you can check this post. Is very helpfull.

I hope it helps.

Vasile.

2 Likes

@wasea
Thank you so much. I was going to remove commas from text to get the date as I didnt realize I can add more patterns.
Is it possible to convert dates in any format to mm/dd/yyyy esp format like December 23, 2020?

@arunasan,
This kind of questions usually are already with answers in this forum.
Like: How to convert Month(Text) into number?

Anyway, in your case, an example can be:

  1. Assign: OldDate = “December 20, 2020”
  2. Assign: NewDate = “Datetime.ParseExact(OldDate ,“MMMM dd, yyyy”,System.Globalization.CultureInfo.InvariantCulture).ToString(“MM/dd/yyyy”)”

Knowing this information, you can try different patterns till, if the date format is changing.

Happy Automation!

Vasile.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.