Compare Two dataTable With Massive Rows

Hello,
Thanks in advance for the help on this.
I am currently struggling to efficiently compare two massive datatables with thousands of rows. Here is by issues
I have dt1 which look like this as an example Screen Shot 2020-05-29 at 4.32.29 PM
and dt2 which looks like this Screen Shot 2020-05-29 at 4.33.01 PM
I need to compare dt1 to dt2 and output the result to a new datable called dt3. The condition is that the output datatable dt3 is every row of dt2 where the permission ID is the same as the permission ID in dt1

My result based on the screenshots should be this for dt3Screen Shot 2020-05-29 at 4.37.41 PM
Thanks

@hatakora
find starter help here
DT_Join_1Col.xaml (10.0 KB)

country is like your dt1
names is like your dt2

so the linq statement within the assign would look like:
left side: dt3
right side:

(From d1 In dt1.AsEnumerable
Join d2 In dt2.AsEnumerable
On d1("Perm ID").toString.Trim Equals d2("Perm ID").toString.Trim
Select d2).CopyToDataTable

You’re probably going to want to use .net to query the dt2 table to an array of datarows. Since it’s an array, you can process and update dt2, or if you choose, copy it to a new datatable dt3. However, normally it’s better to keep it as an array of datarows so you keep the entire data set together. - this will depend on your requirements though.

If you need to update values into the matched datarows, then this will require a For Each on the array. But, based on your post, you don’t need to do this.

Also, you are wanting to match with the Scope column it appears, so you’ll want a For each to loop over the unique scopes and match with the set of Perm IDs for the scope

So, essentially, it would like this psuedocode:

For each scope In dt1.AsEnumerable.Select(Function(r) r("Scope").ToString.Trim).ToArray.Distinct
    permIDs = dt1.AsEnumerable.Where(Function(r) r("Scope").ToString.Trim = scope).Select(Function(r) r("Perm ID").ToString.Trim ).ToArray
    permIDRows = dt2.AsEnumerable.Where(Function(r) permIDs.Contains(r("Perm ID").ToString.Trim) ).ToArray

    IF: permIDRows.Count > 0
        <perform and update rows, or Assign: dt3 = permIDRows.CopyToDataTable>

So what that does is…
— Loop over unique scopes
------ Create array of strings of the Perm IDs that match the unique scope
------ Create array of datarows in dt2 that is contained in the array of Perm IDs from dt1’s scope
------ Check that a match was found using .Count>0
---------- Execute process and update the Array of Rows, or if desired, just assign it using .CopyToDataTable and output it

Hope this method to query your sets of data efficiently is helpful.

Regards.

2 Likes

So when i try this (From d1 In dt1.AsEnumerable Join d2 In dt2.AsEnumerable On d1("Perm ID").toString.Trim Equals d2("Perm ID").toString.Trim Select d2).CopyToDataTable the results do not include the Scope column. which i need to be in the result. I need the result o return the following Columns: Role Name. Role ID, Perm ID, Perm Desc, Perm Val and Scope

@ClaytonM this does make some sense to me so far. I am just a bit lost or more like don’t understand why the scope column is the one being matched, if you wouldn’t mind explaining that part a bit?
Also i am doing any particular update to the data. I just need the final result to return the content of dt2 where the perm Id matches that of dt1 and include the scope column as well . So dt3 should be this Role Name. Role ID, Perm ID, Perm Desc, Perm Val and Scope where the perm id macthes those in dt1

@hatakora
the demo xaml are showcasing 2 approaches

  • the join datatable activity way
  • the LINQ way

So have a look on join datatable activity with debugging and refer the result before removing the joined data columns

Kindly note: in your post the expected output was described with the column structure of dt2. This you will get in the demo from above.

Having more cols you can do:
use a build datatable activity and define the column structure as needed - returning dt3
use a clear datatatable on dt3
use an assign activity
left side dt3
Right side:

(From d1 In dt1.AsEnumerable
Join d2 In dt2.AsEnumerable
On d1("Perm ID").toString.Trim Equals d2("Perm ID").toString.Trim
Let ra = new Object(){d2("Rolename"),d2("Role ID"),d2("Perm ID"),d2("Perm Desc"),d2("Perm Val"),d1("Scope")}
Select dt3.Rows.Add(ra)).CopyToDataTable

I was under that assumption based on your output results you were looking for:
image
All 5 are under the scope of 2

If you need to include the Scope column, then a Join or GroupBy might be more ideal, if you can get ppr’s ideas working.

Ooh @ClaytonM the fact that all 5 are under the scope of two has nothing to do with it. It just happen to be a coincidence. There are other numbers in the scope. The only common thing between both datatable are Perm ID, and Perm Desc. So would your example still work based on the picture i gave of what the dt3 should be with just those columns?

https://forum.uipath.com/t/can-not-connected-data-collect/227271?u=arifudinmyf

please help me

Ok @ppr the join way seemed to have worked . I am going tpo try it on the real data. ANd the linq way worked but did not have all the columns i expected. I will also try this and see. (From d1 In dt1.AsEnumerable Join d2 In dt2.AsEnumerable On d1("Perm ID").toString.Trim Equals d2("Perm ID").toString.Trim Let ra = new Object(){d2("Rolename"),d2("Role ID"),d2("Perm ID"),d2("Perm Desc"),d2("Perm Val"),d1("Scope")} Select dt2.Rows.Add(ra)).CopyToDataTable

@hatakora
Perfect that it starts. I can recommend to you to be active and take the control.
with the second statement the cols from d1, d2 are collected. Give a try and let it run

@ppr so i tried that last one and i am getting error Message Box: Input array is longer than the number of columns in this table. I think that is because this array Let ra = new Object(){d2("Rolename"),d2("Role ID"),d2("Perm ID"),d2("Perm Desc"),d2("Perm Val"),d1("Scope")} has more columns than the columns in dt2 where it is being added Select dt2.Rows.Add(ra)).CopyToDataTable

@hatakora
ah bad typo:

Select dt3.Rows.Add(ra)).CopyToDataTable

replace the line with above

Yup. That is what i thought. I t currently running wit dt3. @ppr waiting to see the result

1 Like

@ppr that did work perfectly. I do have one question about the Linq statement. What if i want to query the opposite where the Perm ID is not the same on both sides?
On d1("Perm ID").toString.Trim Equals d2("Perm ID").toString.Trim

@hatakora
There are multiple Options Like

  • Join Datatable Activity witn the different config settings
  • Components in the Connect Plattform
  • Linq on Perm ID Base

In General the approach is about Findung the Common Perm IDS and Then do an exept on IT

D1keys as list<String> =
dt1.AsEnumerable.Select(function ( r ) r(“Perm ID”).toString.trim).distinct().to list

Similar for d2

CommonKeys aß list<String> =
D1keys.intersect(D2Keys).ToList

NoCommonD1Keys aß list<String> =
D1Keys.Except(CommonKeys).toList

And so on as Just a quick Intro. With the Keys WE can Filter the rows in the different datatable

@ppr this makes sense but what about the columns? The initial approach allowed to pick and add different column to the final datatable because dt1 and dt2 have totally different columns and the only common things id the Perm ID. But we still want to be able to add columns from both dt1 and dt2 to the final Datatable. Same thing for the datatbalse with different perm ID

sorry @ppr don’t mean to be a pain but i really did not get that last part. My initial options this (From d1 In dt1.AsEnumerable Join d2 In dt2.AsEnumerable On d1("Perm ID").toString.Trim Equals d2("Perm ID").toString.Trim Let ra = new Object(){d2("Rolename"),d2("Role ID"),d2("Perm ID"),d2("Perm Desc"),d2("Perm Val"),d1("Scope")} Select dt3.Rows.Add(ra)).CopyToDataTable allows me to get dt3 which is the datatable where the rows from dt1 and dt2 with the same Perm ID are all in one.
I would like to get the opposite as well. So the remaining datatable where the Perm ID from dt1 and dt2 are not the same. Your last comment was a little bit confusion. I get these parts

for d1
D1keys as list<String> = dt1.AsEnumerable.Select(function ( r ) r(“Perm ID”).toString.trim).distinct().to list

Similar for d2
CommonKeys aß list<String> = D1keys.intersect(D2Keys).ToList

NoCommonD1Keys aß list<String> = D1Keys.Except(CommonKeys).toList
Buit the last piece is confusion g as i am not sure on which datatable to filter as you mentioned.

I tried filtering on a join of dt1 nd dt2 but i do not get the expected result. Please assist. Thanks

@hatakora
OK If the Case ist still Open i will have a Look on IT today evening and will come Back to you

Sounds good. See attached screenshot sample of what the data is and what the output that i am trying to achieve is.
dt1 and dt2 are the initial tables. dt3 and dt4 is what i am trying to get
dt1Screen Shot 2020-06-03 at 2.18.23 PM
dt2Screen Shot 2020-06-03 at 2.18.40 PM

With the common Role ID from both dt1 and dt2, dt3 should be the following
Screen Shot 2020-06-03 at 2.19.20 PM

And dt4 should be the opposite or difference ( essentially everything where the role id from dt1 and dt2 are not the same and the commonality is that this datatable will be anything without a role owner)
dt4
Screen Shot 2020-06-03 at 2.20.44 PM