I know this topic has varied responses but then my question is a bit different: I have around 500000 rows with 19 columns, I need to check for duplicates in these rows but in case there are no duplicates the bot is stuck there for more than 30 minutes . Even if there are duplicates I am not able to check if there are any because the same happens in that case.
I am using remove duplicate rows already and tried using distinct as well.
Try coding it. Take your source datatable and in an Assign activity edit the following code, outputting to another datatable
Duplicate records from the same data table and
If you want specific Column alone mention the column name (From p in dt.Select() where( From q in dt.Select() where q("ColumnName").Equals(p("ColumnName")) Select q).ToArray.Count>1 Select p).ToArray.CopyToDataTable()
I faced same scenario in one of my projects where the bot was taking too long time in hours or sometimes it became stuck.
Then we moved the total functionality to database. we created two tables one is for to keep the input data,another one is for output(data after removing the duplicates in your case) and one procedure to remove the duplicates from input data and write output to output table.
here the bot job is to read the input excel file and write the data into database table then we call the procedure from the bot then execute select query to get the data from output table and finally write it into excel file.
500000 records is not a small data and you need to do comparison as well. I strongly suggest above mentioned procedure.
Hi @ashutoshkhantwal, I wanted to know, how did you read 500000 rows in the excel? Because when I tried using the Read range activity (500000 rows and 85 columns), I get the “Job terminated unexpectedly” error. I have also tried reading the excel using SELECT Query but wasn’t able to read all the rows.
We had an excel file in which there were instances proposed to us by the business which could have such number of rows. I tried using the read range itself but it was of no use. I wasn’t aware of sql queries at that time so I couldn’t use that as well. I had to put such cases to exception only since their occurrence was less.