Is it possible to remove a transaction row (with only certain required columns) with duplicate(s) on another data table without merging the data tables? The reason why it cannot be merge and there are multiple data tables is that I only down-scaled. Assuming they will be merge on large-scale transaction, Excel and UiPath cannot handle more than 1 million rows that is why I divided the transaction rows with multiple data tables.
These are the input data tables:
And this is the EXPECTED OUTPUT:
Note: Only ‘UID’, ‘INV_NO’, ‘TXN_AMOUNT’, and ‘TRACE_NO’ columns are the required columns for determining the duplicates.
Is this possible to implement this on a LINQ approach? Any kind of help is much appreciated!
(From itemsA in firstDT where( From itemsB in secondDT where string.Join(“,”,itemsB.ItemArray).Equals(string.Join(“,”,itemsA.ItemArray))
Select itemsB).ToArray.Count>1 Select temsA).CopyToDatatable.DefaultView.ToTable(True)
It will remove the duplicates and provide you the Unique records.
The current description is mentioning duplicates from dt-N in dt-N+1
Is this guaranteed or could it be also the case that dt-N has a row that is duplicated in dt-N+2,3,4… ?
As far we had understood a large data source has been split into x datatables. Thats why different data tables have to be checked, right?
“The current description is mentioning duplicates from dt-N in dt-N+1
Is this guaranteed or could it be also the case that dt-N has a row that is duplicated in dt-N+2,3,4… ?”
It is possible to have N+2,3,4 and so on depending on how large the original data source.
“As far we had understood a large data source has been split into x datatables. Thats why different data tables have to be checked, right?”
Yes, you are right sir. We are splitting large data source that is why it is splitted into x data tables.
We already tried that because some of our data reach up to more than 1 million transaction rows and has more than 20+ columns. It didn’t work with that kind of data and UiPath went to job stop.
than you have to check for duplicates in all the combinations available,
eg, say you have 4 tables:- need to check duplicates for 1 to 2, 2 to 3, 3 to 4, 4 to 1, 1 to 3, 2 to 4 so this will be better approach,
if you insert query inside another query will might throw job error for memory utilization.
@anthonyjr
Kindly note the essential difference between the
dictionary approach and the Cartesian product approach (1 to 2, 2 to 3, 3 to 4, 4 to 1, 1 to 3, 2 to 4)
with the Dictionary approach and tracking the IDs once seen the heavy compare pairs can be ommited