Remove Duplicate Entries with exception of the first one

Hi! I’ve an excel table that is similar to this:


| XXXX | YYYY | ZZZZ | AAAA | BBBB | CCCC |

| Val.1 | Val. 2 | Val. 1 | Val. 5 | Val. 9 | Val. 1 |
| Val.3 | Val. 3 | Val. 2 | Val. 8 | Val. 7 | Val. 3 |
| Val.6 | Val. 1 | Val. 3 | Val. 5 | Val. 5 | Val. 5 |
| Val.1 | Val. 5 | Val. 6 | Val. 3 | Val. 2 | Val. 7 |

And what I want is to remove duplicated row entries lefting behind the first value. Like For the column “XXXX” I want to stay only with “Val.1 , Val.3, Val.6” removing the last row. And then in the column “AAAA” I want only to stay with “Val 5,Val.8”. At the end my table should look like this:


| XXXX | YYYY | ZZZZ | AAAA | BBBB | CCCC |

| Val.1 | Val. 2 | Val. 1 | Val. 5 | Val. 9 | Val. 1 |
| Val.3 | Val. 3 | Val. 2 | Val. 8 | Val. 7 | Val. 3 |

Because the last and the third rows have duplicated values for the same column.

Any idea on how can I do this with a Datatable.Select ??

Hi,

Could you try running this workflow? Not sure it is exactly what you need

I added some testing rows inside the build datatable activity. Let me know if the prompt look logical to you.

Code:

For Each r As DataRow In in_Dt.AsEnumerable.Reverse()
For i As Integer = 0 To in_Dt.Columns.Count - 1
	
If in_Dt.AsEnumerable.Take(in_Dt.Rows.IndexOf(r)).Count(Function(r2) r2(i).ToString = r(i).ToString) > 0 Then 
	MsgBox("Row number : " & (in_Dt.Rows.IndexOf(r)+1).ToString & "has duplicate on column") : Exit  For
End If

Next
Next

If alright I will find something to keep the row with no duplicates rather than prompting the duplicate ones

Main.xaml (9.6 KB)
Cheers

Hi, (@Florent_Salendres)

One way to approach this is to ForEach col In dt.Columns
Then Assign the datatable to remove the duplicates using the solution found here:

So it could be something like this:

ForEach col In dt1.Columns
Assign dataTable = dataTable.AsEnumerable()
.GroupBy(Function(i) i.Field(Of String)(col.ColumnName))
.Select(Function(g) g.First)
.CopyToDataTable

That was just a quick idea I had, but there might be another way to do it without the Loop.

Regards

2 Likes

Hi Clayton,

That much clearer and way and logically makes sense.
Well done!

Cheers

@Florent_Salendres Thak you both! The solution was in the answer of ClaytonM. Thank you for your time and for beeing always available to help.

With best Regards

Hey @ClaytonM , Thank you ! this solution looks more clear and short.

But , If I want rows with only one value occurance and avoid the rows which has more than one, What way I can acheive this ?

Example :
Input :
Col1
AAA
CCC
DDD
BBB
CCC
DDD
EEE

Output :
Col1
AAA
BBB
EEE

Regards,
SP

Hello Amigos,

I need some help to avoid duplicate data while extracting text from text file to Excel. What I’m exactly doing is 1. Extracting data from pdf and store it in text file
2. From that text file I need to extract some mandatory data and paste it in excel
I’m using Index of string activity to identify the desired data. Here where the issue.
For example,

address
xxxxxxxxxxxx,
yyyyyy,
zz,
12345.
xxxxxxxxxxxx,
yyyyyy,
zz,
12345.
Total

In the above example I’m using “address” as first index and “Total” as second index to capture the data in between the two texts(address and Total) and place it in excel, I don’t need repetition in data
I need that address only once. How to do that? please help.

How exactly I need the data is like this:

xxxxxxxxxxxx,
yyyyyy,
zz,
12345.

Thanks,
Srujana

store address and total in excel than apply Linq (distinct) query than you can get result

Hi Indra,

Thanks for reply.I’m afraid to say that I’m very much new to Uipath. May I know how and where exactly I need to use this Linq.

@srujana.kalisetti Linq query will be written inside the for each row

references