I have a datatable which has around 10 columns and around 400K ~ rows.
I want to get the count of duplicates from the datatable.
I tried some linq queries consisting of sequenceequals, itemarray to itemarray comparison but did not get the expected output also the execution time is huge due to may be number of rows.
I cannot attach the exact data so I have tried to prepare some mock.
I want the count of duplicate records from the datatable.
I have 10 columns and 400k to 500k rows, I want the number of duplicates which exist in this data i.e. exact matching records. sample.xlsx (34.1 KB)
Hi @Dhruvi_Arumugam
*Read the excel using read range.
*Take row count count1=dt_input.Rows.Count
*dt_input=dt_input.DefaultView.ToTable(True)
It will remove the all the duplicate rows
*Take row count after deleting duplicate count2=dt_input.Rows.Count
*Count1-Count2. It will give you the duplicate record count.
Nonduplicatedt = dt.DefaultView.ToTable(True,"Column1","Column2") will give you non duplicate rows based on the specified two column…if you want on all columsn then dont specify any column then it would consider all columns for duplicates
now countofduplicates = dt.rowcount - nonduplicatedt.rowcount will gvie you the duplicates count
All the columns currently, and on the expectation point its that I should get the count of duplicates if out of 10 records only 4 are identical then I should get 4 i.e. the duplicate count similar like we have in excel
so it looks like the case of a subset of columns are used for the duplication identification. Just to start with a first step - Key by ColValue Concatenation Approach
arrDupliColFilter = new String(){“ColName1”, “ColName2”,…“ColName10” }
(From d in dtColFiltered.AsEnumerable
Let ck = String.Join(“_”, d.ItemArray)
Group d by k=ck into grp=Group
Let ra = grp.First().ItemArray.Append(grp.Count).toArray
Select r = dtReport.Rows.Add(ra)).CopyToDataTable
When execution time will fall in a too long running range, then in a next step we would check to optimize the 10 col key construction, used for grouping the data
Apologies for giving sample file with extra columns. I have revised the file with only 10 limited columns please find the updated file. Please let me know your valuable inputs sample.xlsx (32.4 KB)
(From d in dtOrigDataTableVar.AsEnumerable
Let ck = String.Join(“_”, d.ItemArray)
Group d by k=ck into grp=Group
Let ra = grp.First().ItemArray.Append(grp.Count).toArray
Select r = dtReport.Rows.Add(ra)).CopyToDataTable
I will test and let you know,
whereas in the below query how will I get the count?
(From d in dtOrigDataTableVar.AsEnumerable
Let ck = String.Join(“_”, d.ItemArray)
Group d by k=ck into grp=Group
Let ra = grp.First().ItemArray.Append(grp.Count).toArray
Select r = dtReport.Rows.Add(ra)).CopyToDataTable