Identifying Duplicate number and changing associated number

datatable
excel
activities

#1

I need to automate a robot that is able to identify the existence of duplicate numbers in different cells in a column. If a duplicate exists I do not want to delete the duplicate. However, there will be another number in a different column and same row that will be generated off of that number. I need to delete that associated number and replace it. There will be dozens of duplicates in the data table that will need to identified among values that only exists once. How can I set the robot up to be able to identify these duplicates, delete the associated number, and insert a new number that will be generated from an outside application?


#2

This is easily done. Please search the forum for examples of looping through a datatable. You can easily amend the details of each row based on any information in that table.

RD


#3

As an additional tip, locating duplicates is very straightforward if you first sort the table column to be searched, thereby putting all duplicate values next to each other. You can do this within Excel with the activity Sort Table, before using Read Range. Alternatively, if the spreadsheet should be left in original order, you can use DataViews to sort/filter/modify the DataTable while preserving its row order.

I’ve given an example of this here:


#4

Seems like the easiest to understand method would be to sort the table, then in one ForEach store a string of the previous row to compare.

For Example,
str1 = ""
ForEach row In dt
If str1<>row.Item(columnIndex)
Then
str1 = row.Item(columnIndex)
Else
row.Item(columnIndex2) = value

I do have a question too though. Does anyone have an example of utilizing Array.ForEach() which (I would think) will be tons faster? I haven’t been able to figure it out in Studio Pro.


#5

I think I found out why. The function you pass to the ForEach method must be an Action(Of T), which is to say that it takes one argument and returns nothing, so it must be a Sub in VB parlance. However, it seems you cannot write your own expression for this as you can with Functions without getting the error “statement lambdas cannot be converted to expression trees”.

A For Each loop will have to do, but I’m fairly sure that’s what the ForEach method would also end up generating.


#6

I thought maybe I could create an Action variable but I keep getting Expression does not return a value, so I don’t think I set up the Action Expression correctly.


#8

I was able to get one to work with AddressOf Console.WriteLine as the action argument and an array of strings as input and the Invoke Method’s GenericTypeArgument set to String, which defines the “Of T” arguments. This works but quite limits the usefulness of this method. The most relevant material I could find is

Only single-line lambda expressions without ByRef parameters may be converted to expression trees. Of the single-line Sub lambdas, only invocation statements may be converted to expression trees.

See also:


#9

I am confused because the “greater than” command and the “equal to” command are the same expression “row.item(columnindex)”. How is that going to compare two different cells in the same column and delete any duplicates?

Suppose the column I want to remove the duplicates from is below…

column name
49123
49123
49123
49434
49434
49501
49602

I need the edited column to appear as so…

column name
" “
” “
49123
” "
49434
49501
49602


#10

@TomG

<> sign means “not equal to”

So,
It initializes str to “” just to start out with.
Then, ForEach row it checks if str equals the row item

  • if it doesn’t equal the row item then it changes str to that row item with an Assign activity like Assign str = row.Item(columnIndex).ToString, so when it goes to the next row item it is comparing the previous row item with the current row item to see if they are not equal
  • when they are equal (as in a duplicate) it does the Else side of the If Statement and changes an item in that row with the Assign activity like Assign row.Item(columnIndex) = “”

Hopefully that makes more sense or maybe I made it more complicated lol.

Thanks.


#12

Thanks for the explanation. I am trying this out I get the following error message for each expression in Condition, assign for Then and the assign for Else.

compiler error(s) encounterd processing expression “str<>row.item(columnindex)”.
'columnindex is not declared. It may be anaccessible due to its protection level.

I tried changing columnindex to columnname because I am using ColumnName to for the Get row item Activity. But that didn’t help.

What am I doing wrong?


#13

Hey sorry,

You get that error because it thinks columnIndex is a variable name when you put it in like that.

columnIndex is the index as an Integer of the column you are referencing. So like, row(3).ToString will give you the 4th item in the row.

If you want to use column name then change it to a string. so like, row(“columnname”).ToString will give you the item in that column.

Or you can store those in a variable too.

If you show a screenshot of your work I could help further, if it’s still an issue.

Thanks!


#15

Here is what I have. No error messages but its not working. I guess I am missing how to get it to iterate through each row in the column automatically and delete the duplicates.


#16

Get row Item activity is actually the same as just typing in row(“column”).ToString. So you don’t need that unless you want to store the item to a new variable before referencing it which is fine but need to replace all the row(“column”).ToString to your new variable.

If your goal is to change the value of a column in duplicate rows your logic looks ok. Have you tried using a Message box to output str and row(“Delivery”).ToString so you can see what they are equalling before your if condition. Also, you could place a Message Box inside both sides of the If condition so you know where it’s going, etc.

You might also use Write Range actiivty to output the datatable to the excel file again, because the spreadsheet isn’t updated until you have done that.

Thanks.

If your goal is to actually remove the row completely then I would probably play with this function
Assign DeliveryTable = DeliveryTable.AsEnumerable().Distinct(DataRowComparer.Default).CopyToDatatable()
in order to go through all the rows where there are no duplicates.


#18

I might not understand you fully.
Are you wanting to clear each duplicated cell within the same row? Or do you just mean clear other cells within the row where that row is a duplicate of another row?

Right now your logic is going through each row and when a row has the same value as the previous row, it will clear that cell. If you want to clear a different cell then just change the columnname in Assign row(“Delivery”) = “” to reference a different column.

If you are looking to instead clear the duplicates within the same row, then you would need a second For loop and it becomes a slightly longer workflow.


#19

Disregard the previous post… I thought I deleted before sending. I added the write range and its working perfectly!! This is awesome! This is step one in my process. I will give you a break from helping me out lol. I will take a stab at the next steps. But I might get hung up and might need some more help.

Thanks for all your help!!!