Having the 50,000 rows in excel file and wants to check Duplicate values in 2 columns using Linq, if having the duplicate then Color the Cell value of Subject in same excel File.
In the Above Screenshot Name and Subject are Duplicate with same Values then color the Subject Value.
Sub HighlightSubjectDuplicates()
Dim ws As Worksheet
Dim lastRow As Long
Dim nameRange As Range, subjectRange As Range
Dim cell As Range
Dim dict As Object
' Set the worksheet reference (replace "Sheet1" with your actual sheet name)
Set ws = ThisWorkbook.Sheets("Sheet1")
' Assuming data starts from row 2, adjust if needed
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
' Set the range for the "Name" and "Subject" columns
Set nameRange = ws.Range("A2:A" & lastRow)
Set subjectRange = ws.Range("C2:C" & lastRow)
' Create a dictionary to store unique names
Set dict = CreateObject("Scripting.Dictionary")
' Loop through the "Name" column to find duplicates
For Each cell In nameRange
If dict.Exists(cell.Value) Then
' Duplicate name found, highlight all corresponding "Subject" cells in yellow
For i = 1 To lastRow - 1
If nameRange.Cells(i).Value = cell.Value Then
subjectRange.Cells(i).Interior.Color = RGB(255, 255, 0) ' Yellow color
End If
Next i
Else
' Add the name to the dictionary
dict.Add cell.Value, 1
End If
Next cell
End Sub
Paste the code in a text file and pass the file path in Code file Path and HighlightSubjectDuplicates is the Entry Method Name.
Output:
groupby and calculate the row indexes of groups with a member count > 1
loop over the returned indexes and setting the cell color
From the sample data above we cannot derive surely the criteria of duplicates. Currently it looks like Subject or Name & Subject. So just redefine the requirement for this.
Try the below code with your input excel file and mention if it’s working. I hope it will work if the Name and subject are same it will highlight the subject value.
(From i In Enumerable.Range(0, dtData.Rows.Count)
Let d = dtData.Rows(i)
Let t = Tuple.Create(i, d(0).toString.ToUpper.Trim, d(2).toString.ToUpper.Trim)
Group t By k1=t.Item2, k2=t.Item3 Into grp=Group
Where grp.Count > 1
From g In grp
Select v=g.Item1).toArray
then loop over arrIndexes
Calculate the Cell Range (index + Offset) - Offset =2, 1 for the 0 based index adaption + 1 for the first Header Line
Set the cell color by using the calculated range
(From i in (Enumerable.Range(0, dtData.Rows.Count))
Let d = dtData.Rows(i)
Let t = Tuple.Create(i, d(“Name”).toString.ToUpper.Trim, d(“Subject”).toString.ToUpper.Trim)
Group t by k1=t.Item2, k2=t.Item3 into grp=Group
Where grp.Count > 1
From g in grp
Select v=g.Item1).toArray
So regarding the above query when I iterating arrIndexes in For each loop giving all log results starting from 0 to end of row - @ppr