[LINQ] - Remove Duplicate Rows with Multiple Unique Identifier Columns from Multiple Data Table

anthonyjr · September 19, 2022, 2:40am

Good day fellow UiPath RPA Developers,

Is it possible to remove a transaction row (with only certain required columns) with duplicate(s) on another data table without merging the data tables? The reason why it cannot be merge and there are multiple data tables is that I only down-scaled. Assuming they will be merge on large-scale transaction, Excel and UiPath cannot handle more than 1 million rows that is why I divided the transaction rows with multiple data tables.

These are the input data tables:

And this is the EXPECTED OUTPUT:

Note: Only ‘UID’, ‘INV_NO’, ‘TXN_AMOUNT’, and ‘TRACE_NO’ columns are the required columns for determining the duplicates.

Is this possible to implement this on a LINQ approach? Any kind of help is much appreciated!

Best Regards,
Anthony Jr.

rohith.prabhu · September 19, 2022, 3:27am

Hello @anthonyjr ,

Use the below LinQ :

(From itemsA in firstDT where( From itemsB in secondDT where string.Join(“,”,itemsB.ItemArray).Equals(string.Join(“,”,itemsA.ItemArray))
Select itemsB).ToArray.Count>1 Select temsA).CopyToDatatable.DefaultView.ToTable(True)

It will remove the duplicates and provide you the Unique records.

Regards,
Rohith

rohith.prabhu · September 19, 2022, 3:30am

@anthonyjr ,

There’s one more solution for this if your data is huge then you can use excel as Data Source and execute SQL queries on it.

Please refer to the below thread for the same:

Regards,
Rohith

anthonyjr · September 19, 2022, 6:09am

Thanks sir @rohith.prabhu

I think this will be only applicable for 2 data tables but it gave me an idea how to start the process and try to twist it up.

Thanks again!

Regards,
Anthony Jr.

anthonyjr · September 19, 2022, 6:10am

Thanks again sir @rohith.prabhu ,

I’ll look into and discover it. Thanks for the reference!

Best Regards,
Anthony Jr.

ppr · September 19, 2022, 7:57am

We have some doubts that we would like to clear.

The current description is mentioning duplicates from dt-N in dt-N+1
Is this guaranteed or could it be also the case that dt-N has a row that is duplicated in dt-N+2,3,4… ?

As far we had understood a large data source has been split into x datatables. Thats why different data tables have to be checked, right?

megharajky · September 19, 2022, 8:07am

to make it simpler you can merge all data into single DT (clear old DT to save diskspace) and remove duplicates in single query

anthonyjr · September 19, 2022, 8:32am

“The current description is mentioning duplicates from dt-N in dt-N+1
Is this guaranteed or could it be also the case that dt-N has a row that is duplicated in dt-N+2,3,4… ?”

It is possible to have N+2,3,4 and so on depending on how large the original data source.

“As far we had understood a large data source has been split into x datatables. Thats why different data tables have to be checked, right?”

Yes, you are right sir. We are splitting large data source that is why it is splitted into x data tables.

Best Regards,
Anthony Jr.

anthonyjr · September 19, 2022, 8:34am

We already tried that because some of our data reach up to more than 1 million transaction rows and has more than 20+ columns. It didn’t work with that kind of data and UiPath went to job stop.

megharajky · September 19, 2022, 8:43am

than you have to check for duplicates in all the combinations available,

eg, say you have 4 tables:- need to check duplicates for 1 to 2, 2 to 3, 3 to 4, 4 to 1, 1 to 3, 2 to 4 so this will be better approach,
if you insert query inside another query will might throw job error for memory utilization.

to get the list of combinations you can refer this thread: c# - Get all pairs in a list using LINQ - Stack Overflow

ppr · September 19, 2022, 8:48am

not sure if this question was get and answered by you. Let me rephrase:

Your sample had not the case that table1 has a rows which is duplicated in Table3. Could this scenario also happen?

However to preview on solution approaches (when split tables are to kept)

loop over the tables and track the IDs in a Dictionary as unique Keys - dictLKID
retrieve IDs from checked table, calculate the duplicated IDs by using the dictLKID and remove the duplicated rows

The reason why we suggest this Dictionary approach is the outcome from another Topic where we reduced the processing time from 24+ hrs to seconds

anthonyjr · September 19, 2022, 8:55am

Sorry my English comprehension is not that good.

Yes it is also possible that Table1 has a rows which is duplicated in Table3. This scenario could happen.

Thank you sir @ppr ! I’ll look and discover the reference you suggested.

Best Regards,
Anthony Jr.

anthonyjr · September 19, 2022, 9:00am

I see, I’ll discover this approach and try to test if this can be applicable on the process.

Thank you sir @megharajky for you recommendation!

Best Regards,
Anthony Jr.

ppr · September 19, 2022, 9:03am

@anthonyjr
Kindly note the essential difference between the
dictionary approach and the Cartesian product approach (1 to 2, 2 to 3, 3 to 4, 4 to 1, 1 to 3, 2 to 4)

with the Dictionary approach and tracking the IDs once seen the heavy compare pairs can be ommited

Topic		Replies	Views
Want to remove duplicate rows from merged data table Help	6	1764	February 14, 2020
[LINQ] Remove Multiple Columns Matching Rows between Two Data Tables Activities datatable , excel , activities , studio , linq	1	786	September 27, 2022
How can I identify duplicate columns in my data table and remove it? Studio datatable , excel , uiautomation , activities , studio , question	5	295	March 31, 2024
Need to remove duplicates on single column Activities excel , activities , question	6	998	December 13, 2022
How to remove duplicates rows data from data table Automation Starter	21	11120	December 6, 2023

Most Active Users - Yesterday
Anil_G
ashokkarale
sudster
Matias_Clemente.Arg
Ayatulla_Middya
Yoichi
Phenyo
A_Learner
Kumar_Pawan_2_Consultant
More details...

[LINQ] - Remove Duplicate Rows with Multiple Unique Identifier Columns from Multiple Data Table

Related topics