Hello! I’d like to ask for help in following problem.
I have an Excel file with multiple columns which I need to filter in a specific way. What I want to achieve is to keep only these rows which have the same values in selected columns.
Example table:
Name
Surname
DateBirth
PlaceBirth
Address
City
PostalCode
John
Green
1981
NY
Street1
City1
Code1
Chris
Green
1981
LA
Street2
City2
Code2
George
Green
1980
LA
Street3
City3
Code3
Andy
Green
1981
NY
Street4
City4
Code4
John
Black
1980
NY
Street5
City5
Code5
Andy
Black
1981
NY
Street6
City6
Code6
Andy
Black
1981
NY
Street7
City7
Code7
Andy
Black
1980
NY
Street8
City8
Code8
John
Green
1981
NY
Street9
City9
Code9
From the table above I need to drag out these rows where values for first 4 columns are the same. In this example the proper result would be 2 sets of data:
I tried to create DTs for each column, then clean it from duplicates and then use it in a ‘for each row’ for each of the column which I need to filter. Unfortunately, nesting it one in another doesnt work for more than 2 columns (my code in attachment)
. I’m sure there must be another way - please help
Sorry, i didn’t see that was already included in your response. So if I understand correct, you want to pull ONLY if there are 2+ matches and if the item is unique, you don’t want it. Each of the matches you found should be put into it’s own datatable with the schema the exact same as the original table.
I will @ppr to see if he’ll reply as he’s a master with this. I’m not 100% sure it’s even possible and would have to fiddle around a lot to make it work, but I think you want to use LINQ to return an array of arrays of datarows (not a type, it’s an array of arrays). Then you can use a for each loop to iterate through the array of an array to clone the original datatable structure and add the datarows to the newly created datatable.
Of course a simpler method that is much less effecient would be to create a new column concatenating the first 4 columns, find all unique items (count = 1) to delete, then sort the datatable by your concatenated column. Then use a for each loop to check if the value equals the previous row. If it does, add it to the same datatable. If it doesn’t, add it to a new datatable
As @Dave mentioned there is the concat trick even used within a groupBy or we constructing 4 keys by using the AND Concatenation (I did here,as other tricks are less easy to read/follow, also when it is posible).