Removing duplicates based on one Column

Hello,

I need to remove duplicates from a DT, base on one single Column, I’ve tried the given below, but seems, I’m doing something wrong…

PS : Still learning LINQ…

dtTest.AsEnumerable().GroupBy(Function(i) i.Field(Of String)(“Col B”)).Select(Function(g) g.First).CopyToDataTable

As input, the content of the columns :

Col A Col B
123 XXXXXX 21 H
123 XXXXXX 21 H
123 XXXXXX 21 H
125 XXXXXX 8H
125 XXXXXX 8H
125 XXXXXX 8H
125 XXXXXX 8H
125 XXXXXX 8H
555 250h - 24 XX
555 250h - 24 XX
555 500h - 24 XX
555 500h - 24 XX
555 100h - 24 XX
555 100h - 24 XX

What excpted as result :

Col A Col B
123 XXXXXX 21 H
125 XXXXXX 8H
555 250h - 24 XX
555 500h - 24 XX
555 100h - 24 XX

What I get after using my query :

Col A Col B
123 XXXXXX 21 H
125 XXXXXX 8H
555 250h - 24 XX

Thanks in advance !

Regards,
Mohssine

Hi @mz3bel

How about this expression ?

DtOutput = DtBuild.Clone

(From d In DtBuild.AsEnumerable
Group d By k=d("Col A").toString.Trim, k2=d("Col B").ToString.Trim Into grp = Group
Let ra = New Object(){k,k2}
Select r = DtOutput.Rows.Add(ra)).CopyToDataTable

Regards
Gokul

4 Likes

Hi @mz3bel

Check out the XAML file

HigestValue.xaml (11.1 KB)

image

Output

image

Regards
Gokul

1 Like

with looking only to column B the Duplicates are too wide clustered:

vs

As your sample data shows 2 cols and duplicate detection now affects all columns essential deduplications can be done like Remove Duplicates Activity or

dtFiltered = dtTest.DefaultView.ToTable(True)

So, the LINQ acrobatics can be avoided

1 Like

Hello @Gokul001 ,

I’m sorry, after trying on the real excel file, it doesn’t work quite as expected, since I have multiple other columns, that I didn’t mention, didn’t think it will matter.

Regards,
Mohssine

1 Like

Hello @ppr ,

I have tried that activity before, it didn’t quite work for me, because I have other columns that I didn’t mention, I didn’t think it would matter, both solution are not working.

Regards,
Mohssine

Hello @mz3bel, :smiley:

Try this LINQ expression, it’ll return you all duplicate items:

dtTest.AsEnumerable.GroupBy(function(r) r("Col B")).Where(function(r) r.Count() > 1).SelectMany(function(l) l).ToArray().CopyToDatatable

Regards,

Bruna Bruno.

Hi @brunabruno ,

Thank you, but it’s not what I’m looking for.

Regards,
Mohssine

Ohhh, I understood now. :smiley: Try this @mz3bel:

dtTest.AsEnumerable.GroupBy(Function(r) r("ColB")).Select(Function(r) r.First).CopyToDataTable

Now my workflow brings this result:

Col A Col B
123 XXXXXX 21 H
125 XXXXXX 8H
555 250h - 24 XX
555 500h - 24 XX
555 100h - 24 XX

:point_right: Check here: RemoveDuplicateItems.xaml (6.0 KB)

Regards,

Bruna Bruno.

That’s your fault you didn’t mention it in the Query. You can Use : grp.First()(Column index)

image

HI @mz3bel

You Linq expression is also working. Can you check the XAML file

DtBuild.AsEnumerable().GroupBy(Function(i) i.Field(Of String)("Col B")).Select(Function(g) g.First).CopyToDataTable

Workflow

HigestValue.xaml (12.7 KB)

image

image

Regards
Gokul

Hello @Gokul001

After multiple tries, It failed to work, but I got an idea thanks to ours last discussion, I removed alll the unnecessary columns, which at the end left me only with 4 columns, so than later i do the linq query to remove the duplicates, it works at last :tada:

Thank you everyone for your kind support !

Regards,
Mohssine

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.