Remove duplicates from a specific column

Hello everyone ,
I have to remove duplicates only from a specific column so that my Bot does not pick the repeated value in the next iteration. Currently I am using Invoice key and Unique reference to login to an application make a search . The Invoice Key has duplicates whereas Unique reference doesn’t have duplicates. How can I eliminate duplicates only from Invoice Key column keeping the actual Data table structure intact without deleting other rows in the Data table ?

Any sort of input would be much appreciated . Thanks !

Good morning Shailesh,

Does this post help resolve your question?

Hi @Shailesh123,
In this link I have shown 2 way to get the distinct values from datatable.

Regards
Balamurugan.S

@chenderson-Good morning . The mentioned post would be helpful if I want to remove all the other row values as well. In my current scenario I do not want to remove values from other rows. I just want to delete duplicates from a Column so that it does not effect rest of the other values from my Data table. Does that give you an idea of what I actually needed?

I’m still a little unclear Shailesh.

If row 1 and 2 have the same value in ColumnA, do you want the value in row 2 (for this column) set to a blank/empty string?

Hi @chenderson- in my scenario only out of 50 columns, only one Column is having duplicate values and I want to delete/empty those cells which have repeated values from that specific column only . How can i do that is what I meant to ask.

Regards
Shailesh

so you have a datatable with 50 rows and 50 column. There is a duplicate value in column 1 rows 1 and row 2. So you want the output to have 50 rows, but the value in row 2, column 1 should be blank? Or did you want to completely remove row 2 altogether?

@Dave - I do not want to remove row 2 completely . I just want to remove value in row 2 column 1 which is duplicate and nothing else.

I do not want to remove row 2 completely . I just want to remove value in row 2 column 1 which is duplicate and nothing else

I think you can use LINQ to achieve this although I haven’t tested it. I am assuming the field you are comparing is of type string. I am assuming your datatable is a variable called dt

Create a variable of type Enumerable i’ll call RowsWithDuplicates

Assign RowsWithDuplicates = dt.AsEnumerable().GroupBy(Function(x) x.Field(Of String)(“YourColumnName”)).Select(Function(y) y.First())

Now use a for each activity to iterate through and update the column with an assign activity. Assign row.item(“YourColumnName”) = string.empty

I haven’t tested, so syntax might be off

@Dave I am unable to iterate through it as it a generic collection while I need is a datatable as the output .

Well I did forget to include that you wanted duplicates. The code above is selecting everything EXCEPT the duplicates. I had a chance to test this one and it works:

Assign RowsWithDuplicates = dt.AsEnumerable().GroupBy(Function(x) x.Field(Of String)(“YourColumnName”)).Where(Function(y) y.Count() > 1).Select(Function(y) y.First)

This is of type IEnumerable<DataRow>

Then in a for each loop, change the TypeArgument to DataRow.

Assign row.item(“YourColumnName”) = string.empty

I have tested it and this will work. It iterates through the collection of datarows directly from your datatable. Your output is still the original datatable. Here is a sample .xaml
Shailesh123.xaml (9.4 KB)

EDIT: I realized after posting that this will only work if there are only 2 rows that are duplicated. If it is 3+ it will still leave duplicates. This can be resolved by putting it all within a while loop, or updating the assign activity to get IEnumerable<iEnumberable<datarow>> type and doubling the for each loop. I attached the updated .xaml showing the second method here: Shailesh123-updated.xaml (10.7 KB)

@Shailesh123 did the solution provided work for you?

@Dave Hello Dave it did partially solve my problem . But now I want to read the relevant value of row 2 . For example : if Row1 has a Key I want to basically read Row2 as a Value for Row1.
So that when I pass Row1 value as a variable , I can read Row2 value as another variable which is equal to row1 .

Do you always want the row below it? If so, do it in a for each row loop.

For each row in dt
If (put your condition here to check for key value)
Assign row.item("YourColumnName") = dt.Rows(dt.rows.indexof(row)+1).item("YourColumnName").ToString

This will loop through the entire table and assign the value from the row below it into the current row for the column specified. Keep in mind if the last row meets your if condition it will throw an error, because there is no row below it to grab the value from