Check for Duplicate Rows

I know this topic has varied responses but then my question is a bit different: I have around 500000 rows with 19 columns, I need to check for duplicates in these rows but in case there are no duplicates the bot is stuck there for more than 30 minutes . Even if there are duplicates I am not able to check if there are any because the same happens in that case.

I am using remove duplicate rows already and tried using distinct as well.

Can anyone help?

Try coding it. Take your source datatable and in an Assign activity edit the following code, outputting to another datatable

Duplicate records from the same data table and
If you want specific Column alone mention the column name

(From p in dt.Select() where( From q in dt.Select() where q("ColumnName").Equals(p("ColumnName")) Select q).ToArray.Count>1 Select p).ToArray.CopyToDataTable()

1 Like

Hey @ashutoshkhantwal,

You can use the code below:

var duplicates = dt.AsEnumerable().GroupBy(r => r[0]).Where(gr => gr.Count() > 1).Select(g => g.Key)

Hi @ashutoshkhantwal,

I faced same scenario in one of my projects where the bot was taking too long time in hours or sometimes it became stuck.

Then we moved the total functionality to database. we created two tables one is for to keep the input data,another one is for output(data after removing the duplicates in your case) and one procedure to remove the duplicates from input data and write output to output table.

here the bot job is to read the input excel file and write the data into database table then we call the procedure from the bot then execute select query to get the data from output table and finally write it into excel file.

500000 records is not a small data and you need to do comparison as well. I strongly suggest above mentioned procedure.

Let me know for any queries.

@amarasto this code is in C# - you’d need to convert to the syntax for use in uipath

Hi @Dave Dave,

There is no need to convert this code.I am using the same syntax inside assign activity.

1 Like

Good to know! that syntax seems so much more intuitive

Hi , would this work even in case it doesn’t find any duplicates? So far I get stuck at that point.

Sure will give this a try

Seems doable , thanks for the help. Will get back to you in case I have any issue.

Hey @ashutoshkhantwal,

Yes it will work.If there are not any duplicate then duplicates object will be null.

Hi , if I may ask. What us gr and r?

It doesn’t work.

gr and r are the anonymous names.

are you getting any error?

Hi @ashutoshkhantwal, I wanted to know, how did you read 500000 rows in the excel? Because when I tried using the Read range activity (500000 rows and 85 columns), I get the “Job terminated unexpectedly” error. I have also tried reading the excel using SELECT Query but wasn’t able to read all the rows.

We had an excel file in which there were instances proposed to us by the business which could have such number of rows. I tried using the read range itself but it was of no use. I wasn’t aware of sql queries at that time so I couldn’t use that as well. I had to put such cases to exception only since their occurrence was less.