How to delete duplicate rows in Excel file based on multiple columns and with specific check on one column

Hi,

I know such kinds of questions are asked frequently but they did not solve my problem.

I have an excel file that contains half of million data. An example picture is following:

Now First I want to check if there are any duplicate rows in first six columns(Customer, Document, Item, Material, Date, Origin) and if there are then delete the entire rows till the column “Status”.

The other check is on column “Datum Satzerzeugung”. If there are duplicate rows based on above six columns and if the date in “Datum Satzerzeugung” is different then delete the row on latest date and keep the row on old date. For example, look at the below figure

The rows 7 and 8 are duplicate based on first six columns but the date in “Datum Satzerzeugung” is different for both rows. So, delete the entire row 8 and keep row 7.

Please provide the complete solution as my knowledge in using LINQ expression is limited. I managed to delete rows based on single column but on multiple column I could not succeed.

Hi @WASEEM_KHAN

Try this

Thanks
Ashwin.S

Hi @WASEEM_KHAN

Please check this video: UiPath | Remove Duplicate Rows from Excel / DataTable using two columns | Delete Duplicate Rows LINQ - YouTube

Best regards
Mahmoud

@WASEEM_KHAN

Can you share sample excel file?

I have designed a workflow on dummy data. You tailor it according to your requirement

image

image

The LINQ used is


(From row In inputDT
Group row By 
k1 = row("ID").ToString,
k2 = row("DocNum").ToString
Into grp=Group
Let md = grp.Min(Function (x) CDate(x("Date").ToString))
Let fr = grp.Where(Function(x) x("Date").ToString.Equals(md.ToString))(0)
Select outDT.Rows.Add({k1, k2, fr("Date"), fr("Status") })).CopyToDataTable

For your reference

LINQ For Group By Multiple Columns.xaml (11.9 KB)