Is there an efficient way to get all the duplicate rows from the Datatable and then delete them without using for each?

how to get duplicate rows as a datatable(Dt_C) From (Dt_A), and remove all duplicate rows to generate( Dt_B) without using for each?

※because in the real world, Dt_A have billions of rows…

The image is as follows.
column" Value1" have duplicate rows(row2, row3), cut out row2, row3 to create Dt_C. and remove all duplicate rows from Dt_A.
※ I don’t want to keep the first occurrence. so I can’t just use remove duplicate rows activity to solve this.

thank you for reply. In this case, I can’t just use remove duplicate rows activity to solve this.
because I don’t want to keep the first occurrence.
I want to get all the duplicate rows to output Dt_C, and then remove all duplicate rows to Dt_B from Dt_A.

Hi,

Can you try the following expression?

dtB=dtA.AsEnumerable.GroupBy(Function(r) r("value1").ToString).Where(Function(g) g.Count=1).SelectMany(Function(g) g).CopyToDataTable()

dtC=dtA.AsEnumerable.GroupBy(Function(r) r("value1").ToString).Where(Function(g) g.Count>1).SelectMany(Function(g) g).CopyToDataTable()

Regards,

2 Likes

Thank you for your reply.
This is a very smart solution. :pray: :bowing_woman: That’s very helpful.
Thank you very much!!!

1 Like

Just so you know, that loops. There is no way to avoid looping when doing things like this. Everyone is so obsessed with avoiding For Each, but there’s no reason. It’s very efficient.

2 Likes

Can’t agree more @postwick

For a resilient solution, we need to take care of many things. One such is a Readable Code that can be supported, extended in later stages process life cycle.
We are in the race of thousand miles, maintaining a pace is necessary.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.