Having the 50,000 rows in excel file and wants to check Duplicate values in 2 columns using Linq, if having the duplicate then Color the Cell value of Subject in same excel File.
In the Above Screenshot Name and Subject are Duplicate with same Values then color the Subject Value.
I would rather suggest you to use VBA macros since you have cell color to highlight id Duplicate found.
Consider the below sample Input taken:
Used VBA Code:
Sub HighlightSubjectDuplicates()
Dim ws As Worksheet
Dim lastRow As Long
Dim nameRange As Range, subjectRange As Range
Dim cell As Range
Dim dict As Object
' Set the worksheet reference (replace "Sheet1" with your actual sheet name)
Set ws = ThisWorkbook.Sheets("Sheet1")
' Assuming data starts from row 2, adjust if needed
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
' Set the range for the "Name" and "Subject" columns
Set nameRange = ws.Range("A2:A" & lastRow)
Set subjectRange = ws.Range("C2:C" & lastRow)
' Create a dictionary to store unique names
Set dict = CreateObject("Scripting.Dictionary")
' Loop through the "Name" column to find duplicates
For Each cell In nameRange
If dict.Exists(cell.Value) Then
' Duplicate name found, highlight all corresponding "Subject" cells in yellow
For i = 1 To lastRow - 1
If nameRange.Cells(i).Value = cell.Value Then
subjectRange.Cells(i).Interior.Color = RGB(255, 255, 0) ' Yellow color
End If
Next i
Else
' Add the name to the dictionary
dict.Add cell.Value, 1
End If
Next cell
End Sub
Paste the code in a text file and pass the file path in Code file Path and HighlightSubjectDuplicates
is the Entry Method Name.
Output:
Note: Code Text file Path is in the project folder shared below
Workflow:
zip file:
BlankProcess23.zip (47.7 KB)
Note: Make sure to enable Macros.
I’m happy to help if you face any difficulties.
Regards
in general we can do
- groupby and calculate the row indexes of groups with a member count > 1
- loop over the returned indexes and setting the cell color
From the sample data above we cannot derive surely the criteria of duplicates. Currently it looks like Subject or Name & Subject. So just redefine the requirement for this.
Hi @vrdabberu as Subject should also be same with the Name in excel File.
Try the below code with your input excel file and mention if it’s working. I hope it will work if the Name and subject are same it will highlight the subject value.
Regards
sounds like a groupby on Name, Subject
arrIndexes | int32 Array =
(From i In Enumerable.Range(0, dtData.Rows.Count)
Let d = dtData.Rows(i)
Let t = Tuple.Create(i, d(0).toString.ToUpper.Trim, d(2).toString.ToUpper.Trim)
Group t By k1=t.Item2, k2=t.Item3 Into grp=Group
Where grp.Count > 1
From g In grp
Select v=g.Item1).toArray
then loop over arrIndexes
Calculate the Cell Range (index + Offset) - Offset =2, 1 for the 0 based index adaption + 1 for the first Header Line
Set the cell color by using the calculated range
@ppr it’s not working fine, Requirement Name as well Subject should be same then only mark as color.
UPDATE: we did a small rework on the LINQ
this we had defined a the groupby criteria
what was modelled in detail at your end?
(From i in (Enumerable.Range(0, dtData.Rows.Count))
Let d = dtData.Rows(i)
Let t = Tuple.Create(i, d(“Name”).toString.ToUpper.Trim, d(“Subject”).toString.ToUpper.Trim)
Group t by k1=t.Item2, k2=t.Item3 into grp=Group
Where grp.Count > 1
From g in grp
Select v=g.Item1).toArray
So regarding the above query when I iterating arrIndexes in For each loop giving all log results starting from 0 to end of row - @ppr
have a look here where we checked the revised LINQ
A is DUP, so returned index i 0,3
B is not DUP as only first Col is same but not third col
So in this direction we had undestood your case