Compare 2 excel files with the same columns and remove all duplicats

Hi,

I have 2 excel files, each with 4 columns (email, A, B, C). The email could be a unique identifier. I need to compare the 2 excels and if I found duplicates to remove both of them, after that to put in a new excel file the other rows which are different.

@andreeabilboreanu follow mr @Palaniyappan 's words to compare the files


then use remove duplicate row into the then part =>which removes the duplicate rows that are obtained by the condition.
and use write range activity into the else part =>which writes the remainings into the new excel…

1 Like

Hei, I managed to do it like that: DT1.AsEnumerable.Except(DT2.AsEnumerable,system.Data.DataRowComparer.Default).CopyToDataTable , but it works only if I have the same columns. If I remove 1 column from the DT1, and then run the code, even if the data in the rest of the columns is the same, and just the third column is missing I will have as output all of the columns. Can you please give me an idea about how to fix it?

The thing you used is like that by default, always will compare all columns, have an idea looking at this:

Dim qry1 = datatable1.AsEnumerable().[Select](Function(a) New With {
        .MobileNo = a("email").ToString()
    })
    Dim qry2 = datatable2.AsEnumerable().[Select](Function(b) New With {
        .MobileNo = b("email").ToString()
    })
    Dim exceptAB = qry1.Except(qry2)
    Dim dtMisMatch As DataTable = (From a In datatable1.AsEnumerable() Join ab In exceptAB On a("ID").ToString() Equals ab.MobileNo Select a).CopyToDataTable()

Could you please explain it to me? I don’t really understand it

As you said you have email column as ID, then you have first two queries that will return only the email column from both excel files (as you already have done read range on both and store as datatable 1 and 2), after we have an Except expression that will get only the different ‘ID’ and with that id will generate the datatable with all columns back.