Remove duplicate rows based on specific columns?

Jon_G · August 28, 2021, 4:29am

Hi all,

I am trying to remove duplicate rows from an input file being processed by my bot.

How can you remove duplicate rows based on specific columns being identical and not all?

For example, I have to filter out duplicate rows in a data file where only criteria such as NAME + POLICY NO + START DATE are identical, to only keep 1 row. Regardless of what’s in the other columns.

For example.

Remove all except 1 row… ???

Sudharsan_Ka · August 28, 2021, 4:36am

HI @Jon_G

Have you tried with Filter Datatable Activity?

Regards
Sudharsan

Jon_G · August 28, 2021, 4:40am

Hey, I have tried Filter activity but not sure how to fill this in to check for duplications. I could only see if column value = …

Sudharsan_Ka · August 28, 2021, 4:45am

Hi @Jon_G

Try these activities

Remove Duplicate Rows (or)
Remove duplicates Range (Where you need to specify Range)

image1216×413 25.1 KB

Hope it helps…

Regards
Sudharsan

kumar.varun2 · August 28, 2021, 4:59am

@Jon_G

This can be done as follows

Instead of Build Data Table activity use the Read Range Activity to read the input excel file into the data table variable (inputDT)

The LINQ used is given below

(
	From row In inputDT
	Group row By 
	k1 = row("FIRST NAME").ToString.Trim,
	k2 = row("SURNAME").ToString.Trim,
	k3 = row("POLICY NO").ToString.Trim,
	k4 = row("START DATE").ToString.Trim
	Into grp = Group
	Select grp(0)
).CopyToDataTable

Finally you can write the output in an excel file using the Write Range Activity.

Please refer the attached xaml file

Remove duplicate rows based on specific columns.xaml (9.7 KB)

Jon_G · August 28, 2021, 6:32am

Wow you’re a legend! This works flawlessly. Thankyou.

Not familiar with LINQ so this is something I might try to learn more about. I’m sure this is a very common function that others will appreciate your solution.

Jon_G · August 28, 2021, 11:38pm

Adding to this… how can I add something to this LINQ so that I can keep the row that contains the MAX or MIN value in the START DATE field?

kumar.varun2 · August 29, 2021, 4:18am

@Jon_G

Bit confused about your requirement. Since we are looking at records which have duplicate START DATE, we cannot apply Min or Max on it.

Please provide an input and output example about your requirement.

Also, since this topic is closed raise it in a separate topic.

For your reference

There are Min and Max methods that can be used for getting a minimum and maximum value.

system · September 1, 2021, 4:19am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Remove duplicate rows based on identical values in two columns Studio	2	2997	September 10, 2020
Remove duplicates from excel considering specific columns Help	2	2224	June 12, 2020
How to remove rows with duplicate columns Help datatable , excel , activities , question	12	8912	November 30, 2019
Remove Duplicate Rows based on Column Activities excel , activities , question	6	8425	March 14, 2024
How remove duplicate name or data in row Help excel , activities , question	14	1763	March 30, 2020

Most Active Users - Yesterday
Ajay_Mishra
ashokkarale
Abhi_Nande
Asantewaa_Mantey
mikko1
E.Y.9
Phenyo
More details...

Remove duplicate rows based on specific columns?

Related topics