Need Help with Email Extraction

UPDATED:06/16/2019
I have learned that with the whole text i can find the phone number and email fairly easily with Regex codes

Phone:
(?\d{3})?[\s-]?\d{3}[\s-]?\d{4}

Email:
\w+[+,.]?\w+[+,.]?\w+@\w+.\w+?\w+.?\w+

Do I put this after my Mail.headers(“html.body”) ? or do I have to use another activity is Matching what i need to be using?

As well anyone who understands this I have no clue how I am going to find the names of the client as the name is so Basic in the layout and always different not sure how to just capture that info?


Hello to whom can help me,

I am extracting an email from a G-suite Gmail account. After long battles over why I could not connect, I figured out it was because the ADMIN did not have allowed less secure apps it did not matter if I had less secure apps active. I am only including this because it took me several days and a lot of forums to find out anything on and I believe it was on a google forum not on here so figured I would throw that out there for anyone else having issues with Email Automation using a Team Domain.

On to my question.

I am trying to Scrape an email for some data

  • It needs to be specific not just the HTML body
  • It is layed out from a Formstack so it looks fairly consistent and might even be able to get a Table out of it?
  • the MEMO field does not always have something in it. I want to make sure the bot does not flip out if it does not see anything there, but I would still like the info if it is there as it typically is a good note to put into CRM

Here is an example of what the email looks like when viewed in Gmail

I labeled each Variable I need. They will all need to be strings I believe by the time they are scraped

I need to do a couple of things with this data later on

  • First, I need to download that URL link to our Team dropbox, Can I simply use a save file and use a wildcard at a certain point when the changes start happening?
  • This is all going to be part of a huge workflow that will also Extract data from that PDF as well,
    so these variables will be consistent along the life of the workflow for that client.
  • After looking up several forums on and several videos I could not find specific details on what it is I exactly need to do here with the email extraction.

Below are pictures that might help with answers on my Email Extraction issue

This is how I receive the HTML. Body from the email it seems to be consistent layout so this should not be too hard… I hope :slight_smile:

BTW I’m not aware of how to group my Code into the little grey box on this Forum, Is it the Blockquote used some how? So if anyone could let me know how to do that because it seems to have loaded some of the code =P from the text file

I am extremely thankful for anyone that has advice here, I believe I want to start a career in becoming an RPA Developer and I just trying to get ahold of the basics ATM.

Regards,
Mr. Joints

BYOP Portal .fs-btn-create:hover { background: #1d697b; } .fs-btn-special:hover { background: #7c438c; } .fs-btn-edit-light:hover { background: #fafafa; } .fs-btn-edit-dark:hover { background: #dfe3e4; } .fs-btn-delete:hover { background: #dc1818; } .fs-btn-link:hover { background: #0187d3; } .fs-btn-confirm:hover { background: #1d9f65; } body { padding: 19px; background-color: #F7F7F8; } body * { -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } img { max-width: 100%; }
Formstack Logo
  </td>
  <td style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; margin: 0; padding: 0"></td>
</tr>

Formstack Submission For: BYOP Portal
Submitted at 06/11/19 3:48 PM

      <div class="fs-submission-table-wrapper" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; margin: 0; padding: 19px 31px 12px; text-align: left" align="left">
          <tr style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; background-color: #CFD4D8; margin: 0; padding: 0" bgcolor="#CFD4D8">
      <td class="fs-submission-table--label" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; color: #6C7A85; font-weight: 700; margin: 0; padding: 12px 19px">
                      Services:
                  </td>
      <td class="fs-submission-table--value" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; color: #1C2F3A; margin: 0; padding: 12px 19px">
                      Excavation<br style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; margin: 0; padding: 0">

Plumbing

Rebar

Electrical

Shotcrete

Start Up

Fountains, Tables, Benches

Concrete/Acrylic Decking

Paver’s- Deck/Patio

Pool Tile

Office

          <tr style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; background-color: #F7F7F8; margin: 0; padding: 0" bgcolor="#F7F7F8">
      <td class="fs-submission-table--label" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; color: #6C7A85; font-weight: 700; margin: 0; padding: 12px 19px">
                      Services:
                  </td>
      <td class="fs-submission-table--value" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; color: #1C2F3A; margin: 0; padding: 12px 19px">
                      Interior Surfacing<br style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; margin: 0; padding: 0">

Construction Clean-up

In-Floor Cleaning Head Set

Door/Gate Closures - Windows/Alarms

Natural / Propane Gas

Masonry Walls, BBQs, Fireplaces

Layouts

Artificial Grass

Landscapes

Pool Fill

          <tr style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; background-color: #CFD4D8; margin: 0; padding: 0" bgcolor="#CFD4D8">
      <td class="fs-submission-table--label" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; color: #6C7A85; font-weight: 700; margin: 0; padding: 12px 19px">
                      Upload Pool Plan:
                  </td>
      <td class="fs-submission-table--value" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; color: #1C2F3A; margin: 0; padding: 12px 19px">
                                      <a href="https://s3.amazonaws.com/files.formstack.com/uploads/1687006/24079710/511738606/24079710_construction_plan_wharton_6-11.pdf" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; color: #0197EC; margin: 0; padding: 0; text-decoration: none">View File</a>
                                </td>
    </tr>
                          
          <tr style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; background-color: #F7F7F8; margin: 0; padding: 0" bgcolor="#F7F7F8">
      <td class="fs-submission-table--label" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; color: #6C7A85; font-weight: 700; margin: 0; padding: 12px 19px">
                      Sent to::
                  </td>
      <td class="fs-submission-table--value" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; color: #1C2F3A; margin: 0; padding: 12px 19px">
                      
                  </td>
    </tr>
                          
                          
          <tr style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; background-color: #CFD4D8; margin: 0; padding: 0" bgcolor="#CFD4D8">
      <td class="fs-submission-table--label" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; color: #6C7A85; font-weight: 700; margin: 0; padding: 12px 19px">
                      Name:
                  </td>
      <td class="fs-submission-table--value" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; color: #1C2F3A; margin: 0; padding: 12px 19px">
                      Brian Wharton
                  </td>
    </tr>
                          
          <tr style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; background-color: #F7F7F8; margin: 0; padding: 0" bgcolor="#F7F7F8">
      <td class="fs-submission-table--label" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; color: #6C7A85; font-weight: 700; margin: 0; padding: 12px 19px">
                      Phone:
                  </td>
      <td class="fs-submission-table--value" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; color: #1C2F3A; margin: 0; padding: 12px 19px">
                      (602) 330-3149
                  </td>
    </tr>
                          
          <tr style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; background-color: #CFD4D8; margin: 0; padding: 0" bgcolor="#CFD4D8">
      <td class="fs-submission-table--label" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; color: #6C7A85; font-weight: 700; margin: 0; padding: 12px 19px">
                      Memo:
                  </td>
      <td class="fs-submission-table--value" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; color: #1C2F3A; margin: 0; padding: 12px 19px">
                      
                  </td>
    </tr>
                          
          <tr style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; background-color: #F7F7F8; margin: 0; padding: 0" bgcolor="#F7F7F8">
      <td class="fs-submission-table--label" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; color: #6C7A85; font-weight: 700; margin: 0; padding: 12px 19px">
                      Email:
                  </td>
      <td class="fs-submission-table--value" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; color: #1C2F3A; margin: 0; padding: 12px 19px">
                      BrianWhartonRealtor@gmail.com
                  </td>
    </tr>
        </table>
<p class="last" style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; font-size: 16px; font-weight: 400; line-height: 21px; margin: 0; padding: 19px 31px 12px">
  
  
</p>

Copyright © 2019 Formstack, LLC. All rights reserved. This is a customer service email.

Formstack, 11671 Lantern Road, Suite 300, Fishers, IN 46038

    </td>
    <td style="-moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; margin: 0; padding: 0"></td>
</tr>

Getting the name…

Try something like (?<=Name: ).*

image

To use variables especially from “Matches” activity:

  1. Create matches activity
  2. put your regex in
  3. create variable for that match
  4. output variable(0).tostring

e.g.
image

then you can use the below in an assign (to assign the match to a string), or in a messagebox, type into etc. etc. etc.

image

Hey Cameron thank you for the help will give this a shot in just a moment when I can get to the computer quick question as I just did a whole study on regex basics what is the < character doing is this just a normal lesser than? My actual name answer when pulled from Html is on a different line so not 100% if this will work but cant wait to try!

Aswell in the string

What is that 0 representing? I understand the rest of that variable

Hey Cam,

I tried this out and it did not work. As i said previously i believe it is because it is on different line.
here is what happened when i tested my HTML.body output with that code on Regex tester

Wanted to leave an update so if anyone stumbles a crossed this they get some answers

I used some Regex Codes to find all the required Fields the name area was defiantly the most difficult.
Here are the codes for anyone’s reference

Phone Numbers
(?\d{3})?[\s-]?\d{3}[\s-]?\d{4}

Email
\w+[+,.]?\w+[+,.]?\w+@\w+.\w+?\w+.?\w+

Name
Name:\s*</td>\s*<td[^>]>\s([^<]+?)\s*</td>

Download Link (deleted a t so it does not become a link ;P)
“htps://s3.amazonaws.com/\w+.\w+.\w+/\w+/\d+/\d+/\d+/\d+.*.pdf”

So to pull using regex I first used
Matches Activity
I output that to an
Assign Activity
where I did need to use Variable(0).Group(1).ToString
The group part was not always needed only for the Name field.

Sometimes my names come through as Mary & Tester McTesterson
The & sign after going through HTML encoding looks like this as a string
Mary & Tester McTesterson
To remove this from all names that could Possibly have that I used a
Replace Activity
replacing and “&” with “and” as this will look and enter into my CRM and Dropbox better

Thank you to those who replied always helps! Using Regex101.com and going the the Chat section is a great way to get super fast help on Regex btw for anyone who needs it.
Thanks lots!

Mr. Joints

2 Likes

Awesome job mate. glad you figured it all out :slight_smile:

edit:
and to answer your question, the (?<=) in the regex string is a positive lookahead, basically it looks for that string and captures after it. i find it the best way if you’ve got a stable template as its very reliable and easy to capture what you need.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.