LINQ query running for long hours

Hi Everyone,

I am using below linq query to compare two datatables based on some columns and then update Yes/No as per comparison results.

The query is taking more than 24 hrs. to execute and breaching my very strict SLA
Please help on the scenario.

(From d1 in dtOldSet.AsEnumerable
let ja = String.Join("", {d1("Host name"),d1("Date of the health check"),d1("Policy name"),d1("Check name"),d1("Check Section"),d1("Check description"),d1("Agreed to settings"),d1("Reason of the non compliance")})
let chk = dtNewSet.AsEnumerable.Any(Function (x) String.Join("",{x("Host name"),x("Date of the health check"),x("Policy name"),x("Check name"),x("Check Section"),x("Check description"),x("Agreed to settings"),x("Reason of the non compliance")}).Equals(ja))
let rs = If(chk, "yes","no")
let ra = d1.ItemArray.Append(rs).ToArray
Select r = dtOutVal.Rows.Add(ra)).CopyToDataTable

dtOldSet contains around 400k rows and dtNewSet contains around 200k rows. I tried with smaller data chunks as well like 100K rows but the execution time is same.

Please help anyone!

@ppr @Yoichi please guide

Your case remains a little unclear. Maybe you can clarify.

Do you want to compare row by row over all rows in 2 tables? (resulting in ca. 200k bool values for you example) Or do you want verify, that the entire tables are the same? (which clearly is not the case since Old set has 400k rows while new set has 200k rows). Please explain.

have a check if working with prepared lists or Check dictionaries let increase the processing performance

arrColSet | String( ) - String Array =

{"Host name","Date of the health check","Policy name","Check name","Check Section","Check description","Agreed to settings","Reason of the non compliance"}

we will check stepwise:
ListDT2 | List(Of String) =
dtNewSet.asEnumerable.Select(Function (x) String.Join(“#”, arrColSet.Select(Function (k) x(k).toString.ToLower.Trim))).Distinct().toList

let us know how fast this list can be generated from the bot. If it is fast enough we can go ahead for the next step

Hi @ppr,

This list over the dtNewSet containing 277k rows was generated in around 3 minutes.

This is working good!. Please guide ahead

Thanks

Hi @c.rahn,

I want to verify the mentioned column in the above query from both the tables if the particular set match then i want to update yes that it matched or else no that it did not match. Also I fetch all the data like all the column/rows data based on only these conditional column matching

Let me know if you got idea on this or will explain in more clear way

Sorry, still not clear to me, what you want to achieve in the end.

It sounds to me, you want to find those rows, that are contained in both tables, right? That is an intersection.

So, as Peter showed before, just get both tables into a list, let’s call them ListDT2 and ListDT1.

Using LinQ Intersect the intersection of both sets can be found (sorry, C# style):

var result = ListDT1.Intersect( ListDT2 ).ToList();

then lets check if following will be faster

(From d1 in dtOldSet.AsEnumerable
let ja = String.Join("", {d1("Host name"),d1("Date of the health check"),d1("Policy name"),d1("Check name"),d1("Check Section"),d1("Check description"),d1("Agreed to settings"),d1("Reason of the non compliance")})
let chk = ListDT2.Contains(ja)
let rs = If(chk, "yes","no")
let ra = d1.ItemArray.Append(rs).ToArray
Select r = dtOutVal.Rows.Add(ra)).CopyToDataTable

Hi @ppr,

Just one doubt before executing the query.

In above we Used String.Join(“#”) for joining and in below query we are checking contains but we have used String.Join(“”) to join the column values. Will this give correct result or we need to keep the placeholder same in both the queries

Thanks!

oh you are right, i correct the statement above to

(From d1 in dtOldSet.AsEnumerable
let ja = String.Join("#", arrColSet.Select(Function (k) d1(k).toString.ToLower.Trim))
let chk = ListDT2.Contains(ja)
let rs = If(chk, "yes","no")
let ra = d1.ItemArray.Append(rs).ToArray
Select r = dtOutVal.Rows.Add(ra)).CopyToDataTable

Thanks!! I will run and let you know the results

Hi @ppr,

I ran your provided solution(did not verify the results yet)
The bot took around 3~4 hours to complete, thanks a lot for making this efficient first of all.
Also, can you please help with some other approaches like you mentioned dictionary or something which can make this more efficient as I have to perform this comparison at multiple places. If bot takes 3 hours one time and I do comparison at 4 places it will sum upto 12 hours for this piece of processing only.

Looking for your more valuable inputs

@Dhruvi_Arumugam
kindly note you are just in phase of optimzation iterations

so now next step we are checking the parallelization

prepare dtOldSet that the status column is present (we dont know the name, you can used add data column activity)

then decompose the LINQ to a parallel for each approach

Hi @ppr,

Can you please tell what is the RHS value in both the assign in the loop so that i do not make any mistake in execution.

Also one doubt, when we are performing String.join on dtOldSet using “#” placeholder as visible below, this value is already present in datatable dtOldSet as a part of text in some columns like “Reason of non compliance” which we are joining. So in this case do we need to change the placeholder to some other symbol or we are good with this “#” only ?

Thanks

ja = String.Join("#", arrColSet.Select(Function (k) d1(k).toString.ToLower.Trim))

similar to the LINQ

d1("StatusCol") = IF(ListDT2.Contains(ja),"yes","no")

we just created a longer string for the many col compair for identifying same rows on a column set level. When the already present # will confuse the identities / unique key then feel free to change it to any other string / char

Kindly note:

  • just run the imp in run mode and not in slowed down debug mode
  • do not add any other things like log messages… as we are focusing on performance optimization
  • with adding .Take(length of part segment as integer) you can run smaller set
    dtOldSet.AsEnumerable.Take(500) - running only the first 500 rows

OK I will test now and let you know the results.
Also can I take Unique key be a combination of characters like “#@” ?

Hi @ppr

I am getting the below error in this assign

image

image

Please suggest!

Thanks!!

show us output from immediate panel, after Check column was added to dtOldSet
use statement: dtOldSet.Columns

Understanding the 6 Debugging Panels of UiPath in the easiest way possible! - News / Tutorials - UiPath Community Forum

I checked and found out that I did silly mistake of adding the required column in different datatable. I will fix that and run then let you know for the results.

Thanks again you are savior !!

Hi @ppr

I tried this parallel for each approach but the time is almost same what we were getting from LINQ

As u suggested I also tried with Chunking the datatable into 100k rows at a time but on average the total time is outcoming as the same.

Please suggest.

Thanks again !!

if possible let us refer to your XAML
Performance tuning can sometimes require more iterations

In your case it is necessary to find out where processing time is spent.

We do check in a first round how fast the for each / for each row / parallel for each will perform for only looping

Parallel for each row in dtVar.AsEnumerable

  • inside the body: do nothing

with this outcome we get a better feeling on its core performance

then we start more and more to explore the next action time consumptions