Remove Duplicate rows based on single column and that too if cell starts with specific values

S_Nitin · October 10, 2023, 11:56am

Hi,

I need to remove duplicates from a excel/DataTable using LINQ. Specifically, I need to remove 2nd occurrence duplicates for a single column(column2), maintaining other columns duplicates as it is…
Very imp is in 2nd column I need to keep other duplicates but need to remove only some special duplicates, which starts with some fix string (e.g. MN-J).

remove duplicates - removing all duplicate rows (not useful)
default view - not useful

Probably can be done using regex or linq.

Thanks in advance for your help!

supermanPunch · October 10, 2023, 12:16pm

Hi @S_Nitin ,

Could you maybe provide us with a Sample data and the Expected Output for that data, this way we will be able to help quicker and provide appropriate suggestions.

Palaniyappan · October 10, 2023, 12:18pm

use assign activity and mention like this

NewDT  = yourDataTable.AsEnumerable().GroupBy(Function(row) row("Column2")).SelectMany(Function(group) If(group.Key.StartsWith("MN-J"), group.Take(1), group)).CopyToDataTable()

Hope this helps

Cheers @S_Nitin

S_Nitin · October 10, 2023, 12:29pm

Data.xlsx (10.1 KB)

supermanPunch · October 10, 2023, 12:43pm

Hi @S_Nitin ,

From the Explanation and the Input/Expected Output, it seems that you would want to Group By based on two columns (First & Second).

Could you maybe try the below and check :

OutputDT = DT.AsEnumerable.GroupBy(Function(x)x(0).ToString+x(1).ToString).Select(Function(x)x.First).CopyToDatatable

S_Nitin · October 10, 2023, 12:48pm

HI @Palaniyappan , Thanks for reply,
Query ran with some modification:
NewDT = yourDataTable.AsEnumerable().GroupBy(Function(row) row(“Column2”)ToString.Trim).SelectMany(Function(group) If(group.Key.StartsWith(“MN-J”), group.Take(1), group)).CopyToDataTable()

But it has grouped all similar value in column 2, we are not supposed to move other values actually. I am even not sure it’s possible or no

S_Nitin · October 10, 2023, 1:02pm

Hi @supermanPunch,
It is partially correct but it is deleting some other rows too.

actually let me explain why i want to delete those duplicates as it is transaction ID and column A is data for that, so if transaction id is repeating it might be taken as another transaction …hence i thought to remove those duplicate …

from 1 id to other new id all data should be of 1 transaction only

is there any other way for it ?

Let me redefine my issue
Delete the duplicate in column 2 but check the respective column 1 values too while deleting if column 1 value is different then dont delete and if column 1 value is same then delete that too.

Gokul001 · October 11, 2023, 7:20am

Hi @S_Nitin

Check out the XAML file

11.10.2023_Forum_3.xaml (11.8 KB)

Regards
Gokul

Dilli_Reddy · October 11, 2023, 7:27am

Hii
@S_Nitin
Please use this LinQ

DT.AsEnumerable.GroupBy(Function(a) Tuple.Create(a(“Column Name1”).ToString,a(“C.Name2”).ToString)).Select(Function(b) b.First).CopyToDataTable

Cheers…!

S_Nitin · October 11, 2023, 10:43am

@Dilli_Reddy -
Thanks, I tried this query it is partially working; Means, along with column 2 duplicates it is also deleting column 1 exact duplicates. ; However, this is useful for me in other scenarios.

Thanks !

S_Nitin · October 11, 2023, 11:36am

hi @Gokul001
Thank you for your reply !

I tried but it is deleting other cells too, which are unique.
I will try to use similar logic., with some modifications.

Gokul001 · October 11, 2023, 11:37am

Based on this input provide by you i have created the logic for that.

Is it working with tha above inpu file?

S_Nitin · October 11, 2023, 12:51pm

@Gokul001

Yes, working for this file and actual data too; however, it is arranging all the codes on collumn2. Hence unable to recognize what column1 values belong to which Col2 codes.

actually let me explain why i want to delete those duplicates as it is transaction ID and column 1 is data for that, so if transaction id is repeating it might be taken as another transaction …hence i thought to remove those duplicate …

from 1 id to other new id all data should be of 1 transaction only

Code Output-

Expected Output:

supermanPunch · October 11, 2023, 1:10pm

@S_Nitin ,

Could you provide us with the Data Scenario where the Logic provided does not work and provide us the Expected Output for that Data. We will be able to help you out faster and with accurate logic once we get the data samples.

Gokul001 · October 11, 2023, 1:54pm

Have you revied this Workflow @S_Nitin

Will Filter the Column2 with contains M-IND store that in the Dt_Filter_M_IND.
Will Filter the Column2 with not contains M-IND store that in the Dt_Filter.
using LINQ expression i will Keep the First Row and element the second one. → DT_Dupli
Use Merge data table - Source -> Dt_Filter and in the Destination - DT_Dupli

S_Nitin · October 12, 2023, 7:37am

Hi @Gokul001 ,
Thank you for your time!
I have gone through the .xaml you have provided and understood too. It works! It is not deleting anything from col1 and deleting mentioned specific duplicates from column 2… Agreed !
But , it is appending col2’a imp data rows at the top, Hence it is difficult to understand the which rows belong to which special code in column 2.

Here, as per pic apple ,sea, Monday is the transaction data for M-IND.45.VC (id)

e.g.

Pic attached to show expected output.

S_Nitin · October 12, 2023, 8:01am

@supermanPunch
Thanks for your time Arpan,

I have added 1 more row to my excel and explained the issue too.
Data.xlsx (138.1 KB)

Problem is it is deleting the rows which has same values in col1 and col2, so crucial data from col1 is getting deleted.

supermanPunch · October 12, 2023, 8:22am

@S_Nitin ,

A Bit of analysing on the Excel sheet data submitted, can we come to a conclusion that if only all the column values are repeated in multiple rows we would require to delete it ?

If we are concentrating on the conditions of Delete part only, we arrive at that same conclusion. Because the condition of Sun, Chennai has repetition and not being removed is not specific to that value but as general condition (mentioned as this belongs to different code in col2 - could not properly get this part).

So with maybe slight modification I do get the Expected Output (with 1 row being interchanged) :

DT.AsEnumerable.GroupBy(Function(x)String.Join(",",x.ItemArray)).Select(Function(x)x.First).CopyToDatatable

Do let us know if this does not work.

S_Nitin · October 12, 2023, 8:56am

@supermanPunch -
Thanks for reply!

Let me try to explain in better way-with new input file (attached)

Delete row (only 2nd occurrence) if all the column values are repeated in multiple rows.
But add condition like that 2) -( 2nd column value of that row must have unique value M-IND).
If, this value is not there, keep the repeated rows as it is. e.g. Sun Chennai (has no M-IND in it)
DataFinal.xlsx (140.0 KB)

S_Nitin · October 12, 2023, 10:07am

Hi,

Finally, I achieved the expected output with vb.net code.
Invoked in UiPath.

Still, thanks you @supermanPunch @Gokul001 @Palaniyappan @Dilli_Reddy - I got to learn many linq queries by you all and inspired to upskill
Maybe I was unable to put my issue properly here hence, I might not get the solution. As your give queries were working for the sample data i gave.

Code Below:

Dim rowsToRemove As New List(Of DataRow)

’ Create a Dictionary to track row counts based on column values
Dim rowCounts As New Dictionary(Of String, Integer)

’ Iterate through the DataTable
For Each row As DataRow In DT.Rows
’ Combine values from columns you want to use to check for repetition
Dim key As String = String.Join(“|”, row.ItemArray.Cast(Of Object)().Skip(1)) ’ Skip the first column (index 0)

' Check if the key is already in the Dictionary
If rowCounts.ContainsKey(key) Then
    ' This is a repeated row
    If row("ColumnNameForSecondColumn").ToString().Contains("M-IND") Then
        ' If the second column contains "M-IND," mark it for removal
        rowsToRemove.Add(row)
    End If
Else
    ' This is the first occurrence of this row, so add it to the Dictionary
    rowCounts(key) = 1
End If

Topic		Replies	Views
Remove Duplicate Rows based on Column Activities excel , activities , question	6	8963	March 14, 2024
Remove duplicate rows based on conditions Studio question	13	2511	February 8, 2021
Remove all of Duplicate Rows base on Single Column Forum question	5	775	May 23, 2021
Remove duplicates from excel considering specific columns Help	2	2230	June 12, 2020
How to remove duplicate records from the DataTable based on 2 Column Names Studio activities	7	1061	May 29, 2020

Most Active Users - Yesterday
Anil_G
kirankumar.mahanthi1
shrikrushna.bhoi
mkt.scott4
sonaliaggarwal47
gorby
DenysYarm
robert.alx23
More details...

Remove Duplicate rows based on single column and that too if cell starts with specific values

Related topics