Hi, I have a task to remove duplicate rows from a DataTable.
If I do it using brute force by comparing one row by every other row it results in a very slow algorithm especially with large data sets.
How do I use self-join on the DataTable to remove duplicate rows very quickly by processing them in bulk?
Input: raw file, contains duplicate rows
Output: Only unique rows
You have an activity for that -
How do I specify criteria that determines whether the rows are referring to the same item? I.e. instead of comparing all the fields in a row, specify custom logic as comparison criteria
E.g. 3 out of 5 important fields are the same, first name + last name and last name + first name are the same etc.
correct me if i’m wrong . As per my understanding you have remove duplicate row based col right?
for this use below code
dt.AsEnumerable().GroupBy(Function(x) x(“Column1”)).Select(Function(m) m.First).copytodatatable
i need to remove duplicate rows, where “duplicate row” is defined by multiple custom criteria
Please state where I am supposed to key in these custom criteria/functions, thanks
can you tell me the condition(mean custom criteria condition)? without knowing condition we can’t provide correct solution
Duplicate data row activity only remove duplicate row. here you can’t mention any condition
for your problem we have to create query … for that we need condition (logic)
I’m also interested to know the solution for this as i got similar requirement, please do update me if get a breakthrough.