Good day fellow UiPath RPA Developers,
Is it possible to remove a transaction row (with only certain required columns) with duplicate(s) on another data table without merging the data tables? The reason why it cannot be merge and there are multiple data tables is that I only down-scaled. Assuming they will be merge on large-scale transaction, Excel and UiPath cannot handle more than 1 million rows that is why I divided the transaction rows with multiple data tables.
These are the input data tables:
And this is the EXPECTED OUTPUT:
Note: Only ‘UID’, ‘INV_NO’, ‘TXN_AMOUNT’, and ‘TRACE_NO’ columns are the required columns for determining the duplicates.
Is this possible to implement this on a LINQ approach? Any kind of help is much appreciated!
Use the below LinQ :
(From itemsA in firstDT where( From itemsB in secondDT where string.Join(“,”,itemsB.ItemArray).Equals(string.Join(“,”,itemsA.ItemArray))
Select itemsB).ToArray.Count>1 Select temsA).CopyToDatatable.DefaultView.ToTable(True)
It will remove the duplicates and provide you the Unique records.
There’s one more solution for this if your data is huge then you can use excel as Data Source and execute SQL queries on it.
Please refer to the below thread for the same:
I did changes in Connection string as per below and it worked.
Connection String : =
Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Source.xlsx;Extended Properties=Excel 12.0 Xml"
SQL Query : - “select * from [Sheet1$]”
I think this will be only applicable for 2 data tables but it gave me an idea how to start the process and try to twist it up.
Thanks again sir
I’ll look into and discover it. Thanks for the reference!
We have some doubts that we would like to clear.
The current description is mentioning duplicates from dt-N in dt-N+1
Is this guaranteed or could it be also the case that dt-N has a row that is duplicated in dt-N+2,3,4… ?
As far we had understood a large data source has been split into x datatables. Thats why different data tables have to be checked, right?
to make it simpler you can merge all data into single DT (clear old DT to save diskspace) and remove duplicates in single query
“The current description is mentioning duplicates from dt-N in dt-N+1
Is this guaranteed or could it be also the case that dt-N has a row that is duplicated in dt-N+2,3,4… ?”
It is possible to have N+2,3,4 and so on depending on how large the original data source.
“As far we had understood a large data source has been split into x datatables. Thats why different data tables have to be checked, right?”
Yes, you are right sir. We are splitting large data source that is why it is splitted into x data tables.
We already tried that because some of our data reach up to more than 1 million transaction rows and has more than 20+ columns. It didn’t work with that kind of data and UiPath went to job stop.
than you have to check for duplicates in all the combinations available,
eg, say you have 4 tables:- need to check duplicates for 1 to 2, 2 to 3, 3 to 4, 4 to 1, 1 to 3, 2 to 4 so this will be better approach,
if you insert query inside another query will might throw job error for memory utilization.
to get the list of combinations you can refer this thread:
c# - Get all pairs in a list using LINQ - Stack Overflow
not sure if this question was get and answered by you. Let me rephrase:
Your sample had not the case that table1 has a rows which is duplicated in Table3. Could this scenario also happen?
However to preview on solution approaches (when split tables are to kept)
loop over the tables and track the IDs in a Dictionary as unique Keys - dictLKID
retrieve IDs from checked table, calculate the duplicated IDs by using the dictLKID and remove the duplicated rows
The reason why we suggest this Dictionary approach is the outcome from another Topic where we reduced the processing time from 24+ hrs to seconds
Let us introduce the next changes
we are creating lists with the concatenated strings from dtold, dtnew
and creating a dictionary of only the items common in both lists
dtNewSet.asEnumerable.Select(Function (x) String.Join(“#”, arrColSet.Select(Function (k) x(k).toString.ToLower.Trim))).Distinct().toList
dtOldSet.asEnumerable.Select(Function (x) String.Join(“#”, arrColSet.Select(Function (k) x(k).toString.ToLower.Trim))).Distinct().toList
Sorry my English comprehension is not that good.
Yes it is also possible that Table1 has a rows which is duplicated in Table3. This scenario could happen.
Thank you sir
@ppr ! I’ll look and discover the reference you suggested.
I see, I’ll discover this approach and try to test if this can be applicable on the process.
Thank you sir
@megharajky for you recommendation!
Kindly note the essential difference between the
dictionary approach and the Cartesian product approach (1 to 2, 2 to 3, 3 to 4, 4 to 1, 1 to 3, 2 to 4)
with the Dictionary approach and tracking the IDs once seen the heavy compare pairs can be ommited