Adding new rows to spreadsheet only if the mail is new

I have a task of using a getting emails from a mailbox and getting their received date, sender, subject, category, etc. My problem is the datatable overwrites all the data every time. In a perfect world I need the bot to read range of the sheet and look at sender + subject to determine if the email is already in the sheet. If it is already in the sheet then do not add another row, but if the email is new then add a row.

For example: if this is the spreadsheet during the last bot run

|Email Received|Sender|Subject|Category|
|5/14/2024|John|Hello world|Pending|
|5/14/2024|Sally|Fix report|Completed|

And then I ran the bot again and it looked at all emails it would see those same emails and see the same sender and subject so it would not duplicate those values, but if another email came in it would append to look like this

|Email Received|Sender|Subject|Category|
|5/14/2024|John |Hello world|Pending|
|5/14/2024|Sally|Fix report|Completed|

Hopefully this makes sense.

A better process would be to move the email to a “Processed” folder after adding it to the spreadsheet, so that it won’t be found in the Inbox next time you run the job.

I had that thought too, but unfortunately because of the nature of the mailbox I can’t move the emails to other folders.

Another option is to have it only process unread emails. But that becomes a problem if other automations or people are using the same mailbox.

What I would suggest is to Read Range the existing Excel file to a datatable. Then after collecting the emails, add them all. Then use the Remove Duplicate Rows activity. Then Write Range to Excel. Yes you are rewriting the entire Excel each time but at least you’ll get the unique results you want.

Thank you for the input. I have attempted this as well, but the issue that arises is the remove duplicate rows is not applicable in my case since per my example the only columns that will be unique identifiers is the sender and subject why the duplicate rows looks for the entire row.

HI @srobey888

You can use the linq query for removing the duplicates rows only which have duplicate values in sender & subject column. Replace the dt_output datatable with your datatable.

dt_Output.AsEnumerable() _
    .GroupBy(Function(i) New With { Key .Sender = i.Field(Of String)("Sender"), Key .Subject = i.Field(Of String)("Subject") }) _
    .Select(Function(g) g.First) _

For writing it back to excel
Hope this helps :slight_smile:

Thank you AJ, this ended up working perfectly.

@srobey888 Awesome glad it worked. You can mark it as solution so that it can help others also.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.