Find the duplicate rows based on one column name and move to a different Datatable

Hi Team,

Need help in the below scenario. I have gone through all the help topics and could find an exact solution for the scenario.

My scenario is, I need to find the duplicate rows in the main datatable based on the “Subject” column and the keeping the 1st occurrences and other non-duplicates in the main datatable and move all the duplicates to another new datatable.

Main Datatable:
image

New Duplicate Datatable should look like this:
image

So once we moved the duplicate to the new datatable, the main data table should have the rows as shown below:
image

Or
In otherwise I need to add a Column “Duplicate” and mark the duplicate rows as “Dup” keeping the first occurrence.
image

Anyone, please help.

Thanks in Advance.

Regards,
Anoop

Heres a solution
mainTable.xlsx (8.4 KB) Sequence.xaml (9.8 KB)
Sample excel (mainTable.xlsx)

Steps

  1. first read mainTable.xlsx into mainDt variable, then declare newMainDt*(table after remove duplicates)* and duplicateDt (table for duplicates)

  2. and declare+initialize a listOfReadSubjects list to store the subjects later

  3. For Each row in mainDt

  • if listOfReadSubjects already contains the current looped subject, add row to duplicate table, otherwise add row to newMainDt and add subject to list
  1. write the new tables
    image

RESULT:

2 Likes

Hi @jack.chan , When I tried the list variable shows ‘empty’? What could be the possible reason?

can you upload your sequence and excel file?

@anoop.mohandas

find starter help here:
FindDupsUniquesFirstFromGroup_By1Col.xaml (9.3 KB)

2 Likes

@ppr

I was just going through your linq, could not get it,
could u please help me understand this?

what is this K? i know d is a alias.
(From d In dtData.AsEnumerable
Group d By k=**d(0).**toString.Trim Into grp=Group ---- could not get it…
Where grp.Count =1 — ???
Select grp.First()).CopyToDatatable

I want to know what this query is doing?
asi too have thing to implement so.

Regards,
Seema

@Seema_S

(From d In dtData.AsEnumerable

  • Iterate over dtData, reference looped row by d

Group d By k=d(0).toString.Trim Into grp=Group

  • group data by first col values, reference this key by k

Where grp.Count =1

  • filter all groups where group count = 1, the unique rows as duplicates have a count > 1

Select grp.First()).CopyToDatatable

  • take the unique row, we use first as it is part of a collection and copy it the result datatable

also have a look here:

In case of further open questions just open a new topic. We would focus the answers to the question of Anoop.

BTW: @anoop.mohandas did you checked the starter help? Any feedbacks? Thanks

3 Likes

@ppr
Thank you its good explaination…
Yet to execute this code…

Hi @ppr, Thanks a lot your solution has partially worked. But I am facing one issue with the duplicate datatable. The duplicate datatable contains all the rows which are duplicate in the main datatable.
My main data table:
image

Currently, after running the program as you provided I am getting like this:

non-duplicate datatable(which is correct as per my requirement):
image

Duplicate datatable:
image

My requirement of the duplicate datatable is shown below (which is the rows of the duplicate values after the first occurrence in the main datatable):
image

For example, in my main datatable AAA is repeating 3 times, so my NON-duplicate datatable should contain only AAA - 1 and my Duplicate datatable should contain AAA - 3 and AAA - 7.

Hope you understand my requirement. Thanks in advance and much appreciated.

Hi @anoop.mohandas ,

For retrieving a List of Duplicates that doesn’t include the first values, you can use the following Linq Query Statement:

(From d In dtData.AsEnumerable
Group d By k=d(0).toString.Trim Into grp=Group
Where grp.Count >1
Select grp.Skip(1).ToList).SelectMany(Function(x) x).ToList

I took cue from @ppr’s brilliant solution. By the way, don’t forget to mark @ppr’s answer as solution.

-Ashwin A.K

1 Like

Hi @ashwin.ashok and @ppr,
It worked. Thanks for your support.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.