Find the duplicate rows based on one column name and move to a different Datatable

anoop.mohandas · March 16, 2021, 12:00am

Hi Team,

Need help in the below scenario. I have gone through all the help topics and could find an exact solution for the scenario.

My scenario is, I need to find the duplicate rows in the main datatable based on the “Subject” column and the keeping the 1st occurrences and other non-duplicates in the main datatable and move all the duplicates to another new datatable.

Main Datatable:

New Duplicate Datatable should look like this:

So once we moved the duplicate to the new datatable, the main data table should have the rows as shown below:

Or
In otherwise I need to add a Column “Duplicate” and mark the duplicate rows as “Dup” keeping the first occurrence.

Anyone, please help.

Thanks in Advance.

Regards,
Anoop

jack.chan · March 16, 2021, 2:28am

Heres a solution
mainTable.xlsx (8.4 KB) Sequence.xaml (9.8 KB)
Sample excel (mainTable.xlsx)

Steps

first read mainTable.xlsx into mainDt variable, then declare newMainDt*(table after remove duplicates)* and duplicateDt (table for duplicates)
and declare+initialize a listOfReadSubjects list to store the subjects later

image632×582 21.2 KB
For Each row in mainDt

if listOfReadSubjects already contains the current looped subject, add row to duplicate table, otherwise add row to newMainDt and add subject to list

image980×612 25 KB

write the new tables

RESULT:

anoop.mohandas · March 17, 2021, 3:03am

Hi @jack.chan , When I tried the list variable shows ‘empty’? What could be the possible reason?

jack.chan · March 17, 2021, 8:23am

can you upload your sequence and excel file?

ppr · March 17, 2021, 11:21am

@anoop.mohandas

find starter help here:
FindDupsUniquesFirstFromGroup_By1Col.xaml (9.3 KB)

Seema_S · March 17, 2021, 9:09pm

@ppr

I was just going through your linq, could not get it,
could u please help me understand this?

what is this K? i know d is a alias.
(From d In dtData.AsEnumerable
Group d By k=**d(0).**toString.Trim Into grp=Group ---- could not get it…
Where grp.Count =1 — ???
Select grp.First()).CopyToDatatable

I want to know what this query is doing?
asi too have thing to implement so.

Regards,
Seema

ppr · March 18, 2021, 11:48am

@Seema_S

(From d In dtData.AsEnumerable

Iterate over dtData, reference looped row by d

Group d By k=d(0).toString.Trim Into grp=Group

group data by first col values, reference this key by k

Where grp.Count =1

filter all groups where group count = 1, the unique rows as duplicates have a count > 1

Select grp.First()).CopyToDatatable

take the unique row, we use first as it is part of a collection and copy it the result datatable

also have a look here:

In case of further open questions just open a new topic. We would focus the answers to the question of Anoop.

BTW: @anoop.mohandas did you checked the starter help? Any feedbacks? Thanks

Seema_S · March 18, 2021, 7:02pm

@ppr
Thank you its good explaination…
Yet to execute this code…

anoop.mohandas · March 19, 2021, 12:53am

Hi @ppr, Thanks a lot your solution has partially worked. But I am facing one issue with the duplicate datatable. The duplicate datatable contains all the rows which are duplicate in the main datatable.
My main data table:

Currently, after running the program as you provided I am getting like this:

non-duplicate datatable(which is correct as per my requirement):

Duplicate datatable:

My requirement of the duplicate datatable is shown below (which is the rows of the duplicate values after the first occurrence in the main datatable):

For example, in my main datatable AAA is repeating 3 times, so my NON-duplicate datatable should contain only AAA - 1 and my Duplicate datatable should contain AAA - 3 and AAA - 7.

Hope you understand my requirement. Thanks in advance and much appreciated.

ashwin.ashok · March 21, 2021, 1:38pm

Hi @anoop.mohandas ,

For retrieving a List of Duplicates that doesn’t include the first values, you can use the following Linq Query Statement:

(From d In dtData.AsEnumerable
Group d By k=d(0).toString.Trim Into grp=Group
Where grp.Count >1
Select grp.Skip(1).ToList).SelectMany(Function(x) x).ToList

I took cue from @ppr’s brilliant solution. By the way, don’t forget to mark @ppr’s answer as solution.

-Ashwin A.K

anoop.mohandas · March 23, 2021, 2:58am

Hi @ashwin.ashok and @ppr,
It worked. Thanks for your support.

system · March 26, 2021, 2:59am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Find Duplicate based on one column and change value of another column Studio activities , question	6	1519	April 22, 2022
How to find duplicate line items in the first column and adding a duplicate count info on it Academy Feedback	8	1220	September 10, 2020
Copy duplicated rows of a datatable into another datatable IT Automation	5	1672	October 16, 2020
Delete of duplicates Academy Feedback	5	721	August 15, 2020
Remove duplicates from excel considering specific columns Help	2	2234	June 12, 2020

Find the duplicate rows based on one column name and move to a different Datatable

Related topics