[HowTo] - Find Duplicate / NonDuplicate rows - checking all columns - without explicit listing the colnames/index

Introduction

In the majority of scenarios duplicated / Non duplicated rows can be identified by a few columns grace to the key column concept. In order to retrieve it, the group by approach helps
It can be done e.g. with some nice components from Connect or LINQ


https://www.tutorialsteacher.com/linq/linq-tutorials

In some scenarios all duplicated / non duplicated rows should be detected by checking all columns.
Setting up an approach without explicit listing all column names or column index would speed up the implementation.

Implementation

For this task the SequenceEqual method can be used
grafik

Lets assume following datatable (only having 2 columns, just to make it small and faster to read)
grafik

Finding the Duplicated Rows

With this Linq we retrieve the rows:

(From d In dtData.AsEnumerable()
Let cnt = dtData.AsEnumerable.Where(Function (r) d.ItemArray.SequenceEqual(r.ItemArray)).Count
Where cnt > 1
Select d).toList
  • It iterates over all rows
  • the current row can be referenced with d
  • for each d it is checked for all rows in dtData (referenced with r) if d.ItemArray SequenceEquals r.ItemArray and the true results will be counted, referenced by cnt
  • if the count is > 1 duplicates are found for this d and the d is passing the filter
  • all found d are added to a list of Datarows that will be returned by this LINQ

Finding the non Duplicats Rows

(From d In dtData.AsEnumerable()
Let cnt = dtData.AsEnumerable.Where(Function (r) d.ItemArray.SequenceEqual(r.ItemArray)).Count
Where cnt = 1
Select d).toList

Same approach but the count has to be excatly 1

With a check on the ListCounts it can be decided if rows are returned or not and can be copied to the corresponding datatable

Result

Finaly we got the resulting DataTables for Duplicates:
grafik

the resulting DataTables for Non Duplicates:
grafik

Demo XAML Download

FindDupsNonDupsByAllCols.xaml (9.8 KB)

11 Likes

Helpful, BookMarked for reference :slight_smile:

Thanks,
Mukesh

1 Like

Hey @ppr,

Nice post, What if I have different scenario’s

First scenario is I want to have Output of distinct column1 Values, Count of based on Column1 and Column2 =‘1’ in Second column,Column1 and Column2=‘2’, Third column consist of total count, Fourth Column consists Average, Fifth column consists any activity based on Second output column and third column, Sixth Column of Average percentage.

Output :-
output1 output2 output3 output4 output5 output6
A 2 1 3 ouput2+ouput3/InputDT.Rows.count value%
B 2 0 2 ouput2+ouput3/InputDT.Rows.count value%
C 1 0 0 ouput2+ouput3/InputDT.Rows.count value%

Thanks for the feedback.
I would suggest that you open a new topic for your case, so we will work on it individually.
Lets keep this track here specific to the HowTo topic and about the SequenceEqual approach.