Should I use Document Understanding to extract data fields from email bodies?

I am trying to extract data from the body of an email and was wondering if Document Understanding is required? I am able to read the email from an Outlook folder. The body is well structured (although certain elements may need to be parsed into multiple fields). At first I thought “document understanding” would apply, but this is just text and maybe some simple string commands would be simpler and cheaper? Please see example below.

(upload://h3MMfqgD9VXeZL7YshlGYfLrYyZ.jpeg)

Hello @kreigfields,

The screenshot is not visible. For emails, I prefer to use string operations (Substring) or RegEx.

You can test regex here:
https://regex101.com/

Vasile.

Case Information
Case Name: CaseNameLast, CaseNameFirst
Case ID: ######

Child(ren) Reunified
Child’s Name/DOB/FSFN Child ID
ChildNameLast, ChildNameFirst, 2011-10-12, FSFNID
ChildNameLast, ChildNameFirst, 2013-02-23, FSFNID

Living Arrangement Information
Reason for Placement Change: Reunification With Parent
Date/Time Child(ren) Placed: 07/25/2020 04:00 PM
Parent Name: ParentFirstName ParentLastName
Address: 1000 Some St.
Lauderhill, Fl 33319
Parent (1) Cell #: (###) ###-####

This email was sent from an unmonitored email acount - Please DO NOT REPLY to this email. Instead, contact the staff shown above.

Hello @kreigfields and welcome to the community

We can use Regex to collect the information from the email.

Thanks for the sample. What is the output? Tell us about the pattern of the text also.

Once provided you will have a response soon :slight_smile:

Cheers

Steve

Case Information
Case Name: CaseNameLast, CaseNameFirst
Case ID: ######

Child(ren) Reunified
Child’s Name/DOB/FSFN Child ID
ChildNameLast, ChildNameFirst, 2011-10-12, FSFNID
ChildNameLast, ChildNameFirst, 2013-02-23, FSFNID

Living Arrangement Information
Reason for Placement Change: Reunification With Parent
Date/Time Child(ren) Placed: 07/25/2020 04:00 PM
Parent Name: ParentFirstName ParentLastName
Address: 1000 Some St.
Lauderhill, Fl 33319
Parent (1) Cell #: (###) ###-####

This email was sent from an unmonitored email acount - Please DO NOT REPLY to this email. Instead, contact the staff shown above.

I’m sorry, let me provide an example of the input and additional information on the text patterns.

1 Like

Hi Steven,

You had asked what the output would look like. I am envisioning storing the data elements in an xlx table. This table information will then be used to automate data entry into a corresponding website. Data fields are indicated in most cases by “:” (for example Case ID:
Child’sName/DOB/FSFN Child ID will be delimited by “,” Parent Name: will be broken out by " "

1 Like

Hello

Hopefully these patterns will help. Insert the Regex patterns into a ‘Matches’ activity.

Regex Pattern:
(?<=Child’s Name\/DOB\/FSFN Child ID\n)([^,]+),\s([^,]+),\s((19|20)\d{2}-\d{2}-\d{2}),\s(.*)
Preview the results here

Then use the following to convert your results to string:

image

How to get Group 1 (Childs First Name) results:

INSERTVARIABLE(0).Groups(1).ToString

How to get Group 2 (Childs Last Name)results:

INSERTVARIABLE(0).Groups(2).ToString

How to get Group 3 (DOB) results:

INSERTVARIABLE(0).Groups(3).ToString

How to get Group 5 (FSFN) results:

INSERTVARIABLE(0).Groups(5).ToString

From the Matches Activity, use a write line activity (or an assign activity) and update the capital letters above with the Result from the Matches Activity.

= the 1st match. If you have multiple matches increase this number. Use a “For each” activity to write out all results.

Hopefully this helps.

Cheers

Steve