Remove duplicates in two columns and keep rest of all the column values

Hi all,

need to remove duplicates in two columns (Customer Code,Jurisdiction Name) the attached sheet and also keep rest of all the column values.

if I use dt_Inputsheet.DefaultView.ToTable(True,“Customer Code”,“Jurisdiction Name”) syntax I am getting these two columns alone in the output dt . but i need rest of all the columns and corresponding values.

Hi,

Perhaps you should use Linq - GroupBY method. The following topic might help you.

If you can share your excel file, we might write expression for it.

Regards,

Sample.xlsx (6.8 MB)

remove duplicates for “Customer Code”, “Jurisdiction Name” columns and keep all the columns.

Hi,

How about the following sample?

img20210827-2

dt.AsEnumerable.GroupBy(Function(r) Tuple.Create(r("Customer Code").ToString,r("Jurisdiction Name").ToString)).Select(Function(g) g.First).CopyToDataTable

Sample20210827-1.zip (2.3 KB)

Note: More specifically, you need to decide which duplicated rows to keep.

Regards,

2 Likes

@Yoichi nice solution! I’ve been seeing a lot of these Linq statements in the forum, and I’d like to learn more about it. Is there a resource you can recommend?

Hi @Jeroen ,

Unfortunately, I’m not very familiar with good LINQ resources in English because my native language is non-English (Japanese). Now @ppr is working on making documents for LINQ (as the following, for example), and it will help us better understanding LINQ, I think.

Regards,

1 Like

Answered with alternate on the forked / duplicated topic thread:

1 Like