Remove Duplicates with LinQ

Hi there,

I need to remove duplicates from a DataTable using LINQ. Specifically, I need to filter out all duplicates for a single column, maintaining the initial Datatable strucutre. Other methods, such as DefaultView, do not work as they print out only one column (the one containing duplicates).

Any suggestion would be highly appreciated, code examples would be terrific.

Thanks in advance for your help!
Nicolò

4 Likes

Hello!

I’m not good with LINQ but try something like this

dataTable = dataTable.AsEnumerable()
        .GroupBy(r => new { columnWithDuplic= r.Field<string>("columnWithDuplic"))})
        .Select(g => g.First())
        .CopyToDataTable();

If doesn’t work, let’s wait for someone with more experience hahaha :smiley:

Regards,

3 Likes

Yup, this one is not working :slight_smile:

Try this (similar to @Lucas.Pimenta’s answer but in VB.NET):

dataTable = dataTable.AsEnumerable()
	.GroupBy(Function(i) i.Field(Of String)("columnWithDuplic"))
	.Select(Function(g) g.First)
	.CopyToDataTable
10 Likes

Works fine! Thanks a lot!

Hey @ncompa
@Lucas.Pimenta answer will also work he just posted a c# version of Lambda Expressions and workflow supported Vb version so to use that in studion you have to follow the way mentioned by @florin.stan

@Lucas.Pimenta answet will also work just some modification will be needed (VB version) so studion will take it :slight_smile:

Regards…!!
Aksh

3 Likes

what uipath activity are you putting this in? i am trying to do the same thing getting rid of duplicates but am unsure where you put this code. i have used Assign with LINQ but it gives me compile errors with this snippet you have.

The code is in an Assign activity. What errors are you getting?

i have this in the Value textbox of Assign…

(From x In dtNew.AsEnumerable()
.GroupBy(Function(i) i.Field(Of String)(“Employee ID”))
.Select(Function(g) g.First))
.CopyToDataTable()
Error is : ‘)’ expecteed but i dont see where. I added paran’s at the end and before From and after First

If i just use it the way you had it earlier without my additional paran’s it tells me “End of expression expected”

Here’s a working example: Main.xaml (12.7 KB) (right click and click Save link as…).

If you still can’t get it to work, please upload a .xaml file with a sample DataTable and the Assign activity with the error.

1 Like

Awesome, that works, thank you! Additionally i need to add a second “ProjectID” column to the Select and GroupBy. I’m trying to add that in as well but cant seem to get it right. I’m not sure if i need to add all the Function stuff again and with different letters (i.e. “e” and “g” like you used) What would the syntax be for adding a second field to that query?

Hi there @jereed,
I believe you will need to regroup and select, as shown below:

dtMyDataTable = dtMyDataTable.AsEnumerable.GroupBy(Function(x) x.item("SomeColumnName")).Select(Function(i) i.First).GroupBy(Function(x) x.item("SomeOtherColumnName")).Select(Function(i) i.First).CopyToDataTable

This appears to work, though there may be a more aesthetically pleasing method :slight_smile:

Thanks,
Josh

If you’re trying to first remove all employee id duplicates and then remove all project id duplicates, @Mr_JDavey’s solution is the one you want. If you’re looking to remove all duplicate pairs of employee id and project id, you can make use of anonymous types:

dataTable = dataTable.AsEnumerable()
	.GroupBy(Function(i) New With {
		Key.EmployeeID = i.Field(Of String)("Employee ID"),
		Key.ProjectID = i.Field(Of String)("ProjectID")})
	.Select(Function(g) g.First)
	.CopyToDataTable()
6 Likes

that worked! Is it possible to specify returning only those fields in the select part?
I tried Select(Function(g(“Employee ID”), g(“ProjectID”)) but that didn’t work.

You want to obtain a DataTable containing only those two columns with distinct employee id and project id pairs? Then all you need is this:

newDataTable = dataTable.DefaultView.ToTable(true, {"Employee ID", "ProjectID"})

The topic creator wanted to remove duplicates but keep all columns, that’s why LINQ was needed.

2 Likes

Thanks florin.stan - you’ve been a great help!
If you can help with the syntax on a follow-up question that would be great…

Now I have 2 datatables with the same structure and I want to write a linq to return where there is a record in dtNew but not in dtPrior based on comparing the 2 fields Employee ID and WBS Level3. I have it working with just one field but can’t seem to get it working with both fields. Here is what I have so far that isn’t working:

(From x In dtNew.AsEnumerable() Where not (From y In dtPrior.AsEnumerable() select y(“Employee ID”)).Contains(x(“Employee ID”)), y(“WBS Level3”)).Contains(x(“WBS Level3”)) select x).CopyToDataTable()

I would want the result to again be Employee ID and WBS Level 3

This is C# code Uipath does not support.