Delete duplicate and keep uniques

This problem is pretty similar to removing duplicates, except that I want to delete everything except for the unique strings that remain.
For example, I have two excel files where there are several URL’s, and when I remove the duplicate rows the result is like this:

Excel1:
www.123.com


www.321.com
www.cba.com

Excel2:
www.123.com
www.321.com
www.cba.com
www.helloworld.com

Result:
www.123.com


www.321.com
www.cba.com
www.helloworld.com

What I aim to achieve:
www.helloworld.com

So I want to pretty much get rid of all the rows and data, except for the part that is unique and only has one of.
Is there any way to achieve that through Datatable filter or something?

Best regards
Robert

@Robert_Wennberg So you want to Achieve the Unique Items from both the Excel Files?

Sorry, poorly explained.
I want to use Excel2 to check inside Excel1 if there are any duplicates, and if there are, delete those in Excel2 and just keep the Unique Items.
Excel1 is used as a reference with more URL’s than Excel2, if that makes any more sense

@Robert_Wennberg I think you want to find the items from Excel2 which are not present in Excel1. Is that right?

Yeah that’s exactly what I am trying to do! Sorry for explaining it so badly

@Robert_Wennberg Can you follow these Steps :

  1. Read the Excel1 file using Read Range Activity. Get the Output as Datatable, say DT1. Read the Exce2 file using Read Range Activity. Get the Output as Datatable, say DT2.

  2. Using an Assign Activity with the Expression below :
    DT1 = DT2.AsEnumerable().Where(function(row) Not DT1.AsEnumerable().Select(function® r(“ColName”).ToString).Any(function(x) x = row(“ColName”).ToString)).CopyToDataTable()

  3. Write the DT1 datatable to an Excel File using Write Range and check if that is the Output you needed.

1 Like

I will see if I can make it work and get back to you with my results!
Thank you

1 Like

The assign activity is giving me an error saying that ‘(’ is missing. However, I am not sure where…
This is what the activity I am pasting is looking like so far:

DT2.AsEnumerable().Where(function(row) Not DT1.AsEnumerable().Select(function) r(“URL”).ToString).Any(function(x) x = row(“URL”).ToString)).CopyToDataTable()

@Robert_Wennberg Sorry, My bad :sweat_smile: I missed a Bracket, Can you try this :
DT2.AsEnumerable.Where(function(row) Not DT1.AsEnumerable.Select(function( r ) r(“URL”).ToString).Any(function(x) x = row(“URL”).ToString)).CopyToDataTable()

2 Likes

Yeah I couldn’t see where it was missing haha!
After pasting it, I get an error once again which I have seen before but not sure how to fix or what it means.
AsEnumerable is not a member of ‘System.Data.Datatable’
How would you fix that?

@Robert_Wennberg Can you check this post for that , I don’t really know when exactly those error occurs, but in this post I have Suggested a Solution. It Should work :

1 Like

I managed to solve the issue and make the automation work. However, the result is not really what I wanted.
I did get fewer results, like I wanted, but the results I got are duplicates from the other Excel…

@Robert_Wennberg Can you provide the Excel Files?

1 Like

I was just about to do that.
20200402.xlsx (11.6 KB) RemoveDupes.xaml (9.1 KB) SearchedApartments.xlsx (73.3 KB)

The expected result would be the row with the url:
http://www.suumo.jp/chintai/jnc_000056694237/?bc=100191048279

That is the only row that I would like to be transferred to the new workbook/DT, seeing as all the other rows are duplicates.

@Robert_Wennberg Correct me if I’m wrong, you want to remove all the rows in 20200402 excel which have URL’s present in SearchedApartments excel , Am I right ?

That is correct

Sorry, I was being stupid and compared the wrong files… I do believe the solution you sent me is working as intended. Thank you so much for helping and I’m sorry for wasting your time!

1 Like

I am trying out the solution you gave me and it’s working fine for the Datarows that has data in it, but for the ones that return null, the Assign activity throws an error saying that “The source contains no DataRows”. I tried using an IF statement to bypass it but with no luck. Any ideas on how to fix it?

@Robert_Wennberg You can use a Try Catch on it or there is a Different method that needs to use an array of Datarows variable. You can use Try Catch and in the Catch Section just put this Assign Statement :

DT = DT1.Clone

I tried what you asked but it is still throwing the same error when it gets to the assign activity you taught me.