Finding holiday dates using regular expressions

Hi Team

I am finding holiday dates using regular expressions. I have an Excel file as shown in Fig.

Input:-

“APR 202314 FriDr.Baba Saheb Ambedkar Jayanti”
“MAY 202301 MonMaharashtra Day”
“JUN 2023No Holidays”
“JUL 2023No Holidays”
“AUG 202315 TueIndependence Day30 WedRaksha Bandhan”
“SEP 202328 ThuGanesh Visarjan/Eid-E-Milad”
“OCT 202302 MonMahatma Gandhi Jayanti24 TueDashera”
“NOV 202315 WedBhaubij”
“DEC 2023No Holidays”
“JAN 2024No Holidays”
“FEB 2024No Holidays”
“MAR 2024No Holidays”

Expected output:-
APR 2023 14
MAY 2023 01
AUG 2023 15
AUG 2023 30
SEP 2023 28
OCT 2023 02
OCT 2023 24
NOV 2023 15

How can I find it?

Thanks
Minal P

1 Like

@minal.patil

Try this pattern:

\b[A-Z]{3}\s+\d{4}\s*\d{2}\b

image

1 Like

A quick pattern, maybe too tolerant as AAA would also be accepted as Month Code
grafik
And can use the groups for the output splits

grafik

[CheatSheet] - System.Text.RegularExpressions | RegEx - News / Tutorials - UiPath Community Forum

1 Like

Hope the below steps would help you resolve this

  1. Use a read range activity and get the output as dt

  2. Then use a assign activity like this to get the no holdiays list first

dt = dt.AsEnumerable().Where(Function(a) NOT a.ToString.Contains(“No Holidays”)).Select(Function(b) Datetime.ParseExact(b.ToString.SubString(0,10).Trim, “MMM yyyydd”, System.Globalization.CultureInfo.InvariantCulture).ToString(“MMM yyyy dd”))

This give rhe converted date format u want as a output

Then u can write back to the excel with WRITE RANGE activity

Cheers @minal.patil

Hi @minal.patil
Try this:

Input= "APR 202314 FriDr.Baba Saheb Ambedkar Jayanti
        MAY 202301 MonMaharashtra Day
        JUN 2023No Holidays
        JUL 2023No Holidays
        AUG 202315 TueIndependence Day30 WedRaksha Bandhan
        SEP 202328 ThuGanesh Visarjan/Eid-E-Milad
        OCT 202302 MonMahatma Gandhi Jayanti24 TueDashera
        NOV 202315 WedBhaubij
        DEC 2023No Holidays
        JAN 2024No Holidays
        FEB 2024No Holidays
        MAR 2024No Holidays"

Output= System.Text.RegularExpressions.Regex.Match(Input,"\b[A-Z]+\s+\d+\b").Value

Str_out= DateTime.ParseExact(Output.ToString,"MMM yyyydd",System.Globalization.CultureInfo.InvariantCulture).ToString("MMM yyyy dd")

Hope it helps!!

Hi @minal.patil

Please find the below xaml for your reference

BlankProcess14.zip (78.1 KB)

O/P:

TestingSheetdate.xlsx (12.8 KB)

Hope this helps!!

and also how to capture 'Independence Day30 ’ and 'Mahatma Gandhi Jayanti24 ’

also how to capture 'Independence Day30 ’ and ‘Mahatma Gandhi Jayanti24’.

from the above screenshot, we do see it caught:

Also find this two dates

AUG 2023 30
OCT 2023 24

just work carefully. We didn’t use a space between year and day. Also it is not present within your text pattern. Same also for the part after the day

Sorry for my mistake.

Also, find these two dates

AUG 2023 30 means 30
OCT 2023 24 means 24

Note:- In Aug month there are two dates 15 and 30 or more than two. I am finding both dates. same as Oct.

As long you will have no match within the preview it is different
Whenever a unmatched parenthese (also idicated as red) is reported, then correct the pattern as shown above

we do not recommend to change again and again the text pattern on which we orient to find out the regex pattern. Just name the requirements right at the beginning completely

Maybe following helps, but then trim the groups

@minal.patil

Try this pattern:

[A-Z]{3}\s*\d{4}\s*\d{2}.*\d\b|[A-Z]{3}\s*\d{4}+\s*\d{2}

Note that this returns the complete string “OCT 202302 MonMahatma Gandhi Jayanti24” and you’d have to do some string manipulation to get the text off.

image

I am highlight the date ,month and year.

Not working this pattern:

[A-Z]{3}\s*\d{4}\s*\d{2}.\d\b|[A-Z]{3}\s\d{4}+\s*\d{2}

Hi @minal.patil ,

Could you maybe check with the below workflow :
Regex_FindHolidayDates.zip (9.8 KB)

we could extend the Regex Pattern to:
grafik

And referering / checking the groups:

Still we do feel that the input has a certain risk of unreliability.
Depending on the input (single string / full text with line breaks) we would suggest to combine it with some other techniques to control the complexity of the regex part

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.