I have an issue removing duplicates based on conditions in two columns. if Column A has duplicate values, check the values for the same rows (Grouped) in Column B for if it ends with D and C for cases when 2 duplicate rows are found. remove the column. I attached a sample file for better understanding and the expected output
SSN
AMOUNT
Identifier
Condition
Value1
4000D
A
Remove
Value1
4000C
B
Value2
2500D
C
Remove
Value2
2500C
D
Value3
3000C
F
Keep one in row 3 or 4 (remove row 3 or 4 and row 8)
Value3
3000C
E
Value3
3000D
G
Value4
4500D
gh
Keep one in row 9 or 10 (remove row 9 or 10 and row 11)
Thanks but I need to remove Value1 and Value2 because both have the same SSN and the Amount is Credit Debit.
The second case is when we have 2 or more credits or debits having same SSN, we are to pick just one credit.
Summary
So we first check the SSN for duplicate rows using just the SSN, after grouping that, we check the amount in the groupee Item if taking the both or more is has one SSN as credit C and other SSN as Debit D, if yes, we are to remove such rows.
Once done with that we validate again to see if there is no SSN with more than one row having either Credit or Debit. If there is, we pick just any of the row.