Removing Duplicate Rows Based on Column Value

I have a data table which includes a combination of rows which have been merged through 2 different data tables.

I need to remove All duplicate rows based on the values in a specific column.
Here is a simplified example of what I have:

Col1, Col2, Col3
A, 12, Today
A, 10, Today
A, 15, Yesterday
A, 17, Yesterday
A, 17, Today

I need to remove all duplicate rows where column B appears more than once and DO NOT want to leave a single iteration of the duplicated row

I then need to filter the results to take anything out with ‘Yesterday’ in Column 3

Hi @elliot.barling

refer below.. It has LINQ query, that should work.. Otherwise you can also try few other solutions mentioned in there..

dt.AsEnumerable().GroupBy(Function(r) r.Field(of String)("<Col Name>")).Select(Function(s) s.First()).CopyToDataTable - This can be used to remove all dups with in the specified column.

dt.DefaultView.ToTable(True) - It will create a new Datatable with unique rows with respect to every column.

Both return a new Data table with no dups.

Thanks @sonaliaggarwal47
I have put in the formula suggested for the LINQ query but it still seems to be returning the first result it finds and removing the rest
I need the query to remove all rows which match and leave me with the truly unique rows based on the column value

Thanks

@elliot.barling

Try this

dtFiltered = dtInput.AsEnumerable().GroupBy(Function(row) row(“ColumnName”).ToString). Where(Function(g) g.Count() = 1).Select(Function(g) g.First()).CopyToDataTable()

Sorry @sonaliaggarwal47

Are these going to be separate activities or is this what goes into a single assign?
Thanks

No, no, this is one command like the earlier one, just a little tweaked to check the count as well

@elliot.barling

Please try this linq in assign

dt = dt.AsEnumerable.GroupBy(function(x) x("Column B Name").ToString).Where(function(x) x.Count=1).Select(function(x) x.First).Where(function(x) Not x("Column 3 name").ToString.Equals("Yesterday")).CopyToDataTable

Cheers

Hello @elliot.barling

Forum_FilterationTask.zip (8.3 KB)

Please try above flow it will work

Regards,
Rajesh Rane

Thanks @Anil_G & @sonaliaggarwal47
I am using the following in my assign, it was working but I am not getting the error:
Assign: The source contains no DataRows.

My assign is as follows:
dt_Filtered = dt_Details.AsEnumerable().GroupBy(Function(r) r(“TrimmedError”).ToString).Where(Function(g) g.Count() = 1).Select(Function(g) g.First()).CopyToDataTable()

1 Like

@elliot.barling

the error says that after filtering for unique records..there are no unique records found at all..all are duplicate

as a proactive measure add a if condition to check count and then on then use assign else side means there are no rows which satisfy the uniqueness

dt_Details.AsEnumerable().GroupBy(Function(r) r(“TrimmedError”).ToString).Where(Function(g) g.Count() = 1).Count>0 to be used in if to check any row exists

cheers

@elliot.barling

Are you getting this as error?

Which column are you filtering on? Col 2? or Col 1?

when you say

does this mean you dont want the rows where col3 have ‘Yesterday’ value?
if yes then filter out these rows first then use LINQ to group and take the first row from group.

As for your error, I think you getting this error because in your actual data there is no unique record (e.g. in your sample case, row 1, 2 and 3 are unique but 4 and 5 are not)

In other answers LINQ, first group the rows based on Col2 and take only group where group count is 1 row. The other LINQs for your given example will exclude the 4th and 5th rows because in here group count is more than 1.

Try below LINQ once and let us know the result
Case1: You dont want rows where col3 = ‘Yesterday’
dt_temp = (from rw in dt_temp.AsEnumerable where rw("Col3").ToString <> "Yesterday" Group rw By rw1=rw("Col2").tostring Into grp = Group select grp.First).CopyToDataTable

Case2: You want rows where col3 = ‘Yesterday’
dt_temp = (from rw in dt_temp.AsEnumerable where rw("Col3").ToString = "Yesterday" Group rw By rw1=rw("TrimmedError").tostring Into grp = Group select grp.First).CopyToDataTable