I need to remove duplicates from a excel/DataTable using LINQ. Specifically, I need to remove 2nd occurrence duplicates for a single column(column2), maintaining other columns duplicates as it is…
Very imp is in 2nd column I need to keep other duplicates but need to remove only some special duplicates, which starts with some fix string (e.g. MN-J).
remove duplicates - removing all duplicate rows (not useful)
default view - not useful
Could you maybe provide us with a Sample data and the Expected Output for that data, this way we will be able to help quicker and provide appropriate suggestions.
HI @Palaniyappan , Thanks for reply,
Query ran with some modification:
NewDT = yourDataTable.AsEnumerable().GroupBy(Function(row) row(“Column2”)ToString.Trim).SelectMany(Function(group) If(group.Key.StartsWith(“MN-J”), group.Take(1), group)).CopyToDataTable()
But it has grouped all similar value in column 2, we are not supposed to move other values actually. I am even not sure it’s possible or no
Hi @supermanPunch,
It is partially correct but it is deleting some other rows too.
actually let me explain why i want to delete those duplicates as it is transaction ID and column A is data for that, so if transaction id is repeating it might be taken as another transaction …hence i thought to remove those duplicate …
from 1 id to other new id all data should be of 1 transaction only
is there any other way for it ?
Let me redefine my issue
Delete the duplicate in column 2 but check the respective column 1 values too while deleting if column 1 value is different then dont delete and if column 1 value is same then delete that too.
@Dilli_Reddy -
Thanks, I tried this query it is partially working; Means, along with column 2 duplicates it is also deleting column 1 exact duplicates. ; However, this is useful for me in other scenarios.
Yes, working for this file and actual data too; however, it is arranging all the codes on collumn2. Hence unable to recognize what column1 values belong to which Col2 codes.
actually let me explain why i want to delete those duplicates as it is transaction ID and column 1 is data for that, so if transaction id is repeating it might be taken as another transaction …hence i thought to remove those duplicate …
from 1 id to other new id all data should be of 1 transaction only
Could you provide us with the Data Scenario where the Logic provided does not work and provide us the Expected Output for that Data. We will be able to help you out faster and with accurate logic once we get the data samples.
Hi @Gokul001 ,
Thank you for your time!
I have gone through the .xaml you have provided and understood too. It works! It is not deleting anything from col1 and deleting mentioned specific duplicates from column 2… Agreed !
But , it is appending col2’a imp data rows at the top, Hence it is difficult to understand the which rows belong to which special code in column 2.
Here, as per pic apple ,sea, Monday is the transaction data for M-IND.45.VC (id)
A Bit of analysing on the Excel sheet data submitted, can we come to a conclusion that if only all the column values are repeated in multiple rows we would require to delete it ?
If we are concentrating on the conditions of Delete part only, we arrive at that same conclusion. Because the condition of Sun, Chennai has repetition and not being removed is not specific to that value but as general condition (mentioned as this belongs to different code in col2 - could not properly get this part).
So with maybe slight modification I do get the Expected Output (with 1 row being interchanged) :
Let me try to explain in better way-with new input file (attached)
Delete row (only 2nd occurrence) if all the column values are repeated in multiple rows.
But add condition like that 2) -( 2nd column value of that row must have unique value M-IND).
If, this value is not there, keep the repeated rows as it is. e.g. Sun Chennai (has no M-IND in it) DataFinal.xlsx (140.0 KB)
Finally, I achieved the expected output with vb.net code.
Invoked in UiPath.
Still, thanks you @supermanPunch@Gokul001@Palaniyappan@Dilli_Reddy - I got to learn many linq queries by you all and inspired to upskill
Maybe I was unable to put my issue properly here hence, I might not get the solution. As your give queries were working for the sample data i gave.
Code Below:
Dim rowsToRemove As New List(Of DataRow)
’ Create a Dictionary to track row counts based on column values
Dim rowCounts As New Dictionary(Of String, Integer)
’ Iterate through the DataTable
For Each row As DataRow In DT.Rows
’ Combine values from columns you want to use to check for repetition
Dim key As String = String.Join(“|”, row.ItemArray.Cast(Of Object)().Skip(1)) ’ Skip the first column (index 0)
' Check if the key is already in the Dictionary
If rowCounts.ContainsKey(key) Then
' This is a repeated row
If row("ColumnNameForSecondColumn").ToString().Contains("M-IND") Then
' If the second column contains "M-IND," mark it for removal
rowsToRemove.Add(row)
End If
Else
' This is the first occurrence of this row, so add it to the Dictionary
rowCounts(key) = 1
End If
Next
’ Remove the marked rows from the DataTable
For Each rowToRemove As DataRow In rowsToRemove
DT.Rows.Remove(rowToRemove)
Next