Removing Duplicate Rows based on Columns conditions and rows values

I have an issue removing duplicates based on conditions in two columns. if Column A has duplicate values, check the values for the same rows (Grouped) in Column B for if it ends with D and C for cases when 2 duplicate rows are found. remove the column. I attached a sample file for better understanding and the expected output

SSN AMOUNT Identifier Condition
Value1 4000D A Remove
Value1 4000C B
Value2 2500D C Remove
Value2 2500C D
Value3 3000C F Keep one in row 3 or 4 (remove row 3 or 4 and row 8)
Value3 3000C E
Value3 3000D G
Value4 4500D gh Keep one in row 9 or 10 (remove row 9 or 10 and row 11)
Value4 4500D gf
Value4 4500C GGG
Value5 6544D TTT Keep
Value6 4545C FF Keep
Value7 6544D GGGH Keep
Value8 4545C vv Keep
Value9 7210C HK Keep one
Value9 7210C FGFG
Value10 1234D DDDDD Keep one
Value10 1234D VFD

expected output below

SSN AMOUNT Identifier
Value3 3000C F
Value4 4500D gf
Value5 6544D TTT
Value6 4545C FF
Value7 6544D GGGH
Value8 4545C vv
Value10 1234D VFD
Value9 7210C FGFG

Thanks

Hi,

Is there any rule to remove A or B in Value1 for example? if we can remove either of them, the following will work. Can you try this?

dt = dt.AsEnumerable.GroupBy(Function(r) r("SSN").ToString).Select(Function(g) g.GroupBy(Function(r) r("AMOUNT").ToString).OrderByDescending(Function(g2) g2.Count).First().First()).CopyToDataTable()

Sample20230609-1aL.zip (7.4 KB)

Regards,

Thanks but I need to remove Value1 and Value2 because both have the same SSN and the Amount is Credit Debit.

The second case is when we have 2 or more credits or debits having same SSN, we are to pick just one credit.

Summary
So we first check the SSN for duplicate rows using just the SSN, after grouping that, we check the amount in the groupee Item if taking the both or more is has one SSN as credit C and other SSN as Debit D, if yes, we are to remove such rows.

Once done with that we validate again to see if there is no SSN with more than one row having either Credit or Debit. If there is, we pick just any of the row.

Thank you

Hi,

How about the following?

dt.AsEnumerable.GroupBy(Function(r) r("SSN").ToString).Where(Function(g) g.Count(Function(r) r("Amount").ToString.EndsWith("D"))<>g.Count(Function(r) r("Amount").ToString.EndsWith("C"))).Select(Function(g) g.GroupBy(Function(r) r("AMOUNT").ToString).OrderByDescending(Function(g2) g2.Count).First().First()).CopyToDataTable()

Sample20230609-1aLv2.zip (7.4 KB)

Regards,

Thanks. This produced desirer result. Is there anyway I return the rows removed so I can throw it out as an exception for Business to deal?

HI,

Can you try the following?

Sample20230609-1aLv3.zip (17.7 KB)

Regards,

Thanks
All fine now

1 Like