How to delete duplicate rows in Excel file based on multiple columns and with specific check on one column


I know such kinds of questions are asked frequently but they did not solve my problem.

I have an excel file that contains half of million data. An example picture is following:


Now First I want to check if there are any duplicate rows in first six columns(Customer, Document, Item, Material, Date, Origin) and if there are then delete the entire rows till the column “Status”.

The other check is on column “Datum Satzerzeugung”. If there are duplicate rows based on above six columns and if the date in “Datum Satzerzeugung” is different then delete the row on latest date and keep the row on old date. For example, look at the below figure

The rows 7 and 8 are duplicate based on first six columns but the date in “Datum Satzerzeugung” is different for both rows. So, delete the entire row 8 and keep row 7.

Please provide the complete solution as my knowledge in using LINQ expression is limited. I managed to delete rows based on single column but on multiple column I could not succeed.


Try this



Please check this video: UiPath | Remove Duplicate Rows from Excel / DataTable using two columns | Delete Duplicate Rows LINQ - YouTube

Best regards


Can you share sample excel file?

I have designed a workflow on dummy data. You tailor it according to your requirement



The LINQ used is

(From row In inputDT
Group row By 
k1 = row("ID").ToString,
k2 = row("DocNum").ToString
Into grp=Group
Let md = grp.Min(Function (x) CDate(x("Date").ToString))
Let fr = grp.Where(Function(x) x("Date").ToString.Equals(md.ToString))(0)
Select outDT.Rows.Add({k1, k2, fr("Date"), fr("Status") })).CopyToDataTable

For your reference

LINQ For Group By Multiple Columns.xaml (11.9 KB)

@kumar.varun2 I need your help. Now I have the data in sql table which is on sql server. I want to do the same thing on sql table. Could you please modify your above solution so that I can apply it on sql table?