I am using below linq query to compare two datatables based on some columns and then update Yes/No as per comparison results.
The query is taking more than 24 hrs. to execute and breaching my very strict SLA
Please help on the scenario.
(From d1 in dtOldSet.AsEnumerable
let ja = String.Join("", {d1("Host name"),d1("Date of the health check"),d1("Policy name"),d1("Check name"),d1("Check Section"),d1("Check description"),d1("Agreed to settings"),d1("Reason of the non compliance")})
let chk = dtNewSet.AsEnumerable.Any(Function (x) String.Join("",{x("Host name"),x("Date of the health check"),x("Policy name"),x("Check name"),x("Check Section"),x("Check description"),x("Agreed to settings"),x("Reason of the non compliance")}).Equals(ja))
let rs = If(chk, "yes","no")
let ra = d1.ItemArray.Append(rs).ToArray
Select r = dtOutVal.Rows.Add(ra)).CopyToDataTable
dtOldSet contains around 400k rows and dtNewSet contains around 200k rows. I tried with smaller data chunks as well like 100K rows but the execution time is same.
Your case remains a little unclear. Maybe you can clarify.
Do you want to compare row by row over all rows in 2 tables? (resulting in ca. 200k bool values for you example) Or do you want verify, that the entire tables are the same? (which clearly is not the case since Old set has 400k rows while new set has 200k rows). Please explain.
have a check if working with prepared lists or Check dictionaries let increase the processing performance
arrColSet | String( ) - String Array =
{"Host name","Date of the health check","Policy name","Check name","Check Section","Check description","Agreed to settings","Reason of the non compliance"}
we will check stepwise:
ListDT2 | List(Of String) =
dtNewSet.asEnumerable.Select(Function (x) String.Join(“#”, arrColSet.Select(Function (k) x(k).toString.ToLower.Trim))).Distinct().toList
let us know how fast this list can be generated from the bot. If it is fast enough we can go ahead for the next step
I want to verify the mentioned column in the above query from both the tables if the particular set match then i want to update yes that it matched or else no that it did not match. Also I fetch all the data like all the column/rows data based on only these conditional column matching
Let me know if you got idea on this or will explain in more clear way
(From d1 in dtOldSet.AsEnumerable
let ja = String.Join("", {d1("Host name"),d1("Date of the health check"),d1("Policy name"),d1("Check name"),d1("Check Section"),d1("Check description"),d1("Agreed to settings"),d1("Reason of the non compliance")})
let chk = ListDT2.Contains(ja)
let rs = If(chk, "yes","no")
let ra = d1.ItemArray.Append(rs).ToArray
Select r = dtOutVal.Rows.Add(ra)).CopyToDataTable
In above we Used String.Join(“#”) for joining and in below query we are checking contains but we have used String.Join(“”) to join the column values. Will this give correct result or we need to keep the placeholder same in both the queries
oh you are right, i correct the statement above to
(From d1 in dtOldSet.AsEnumerable
let ja = String.Join("#", arrColSet.Select(Function (k) d1(k).toString.ToLower.Trim))
let chk = ListDT2.Contains(ja)
let rs = If(chk, "yes","no")
let ra = d1.ItemArray.Append(rs).ToArray
Select r = dtOutVal.Rows.Add(ra)).CopyToDataTable
I ran your provided solution(did not verify the results yet)
The bot took around 3~4 hours to complete, thanks a lot for making this efficient first of all.
Also, can you please help with some other approaches like you mentioned dictionary or something which can make this more efficient as I have to perform this comparison at multiple places. If bot takes 3 hours one time and I do comparison at 4 places it will sum upto 12 hours for this piece of processing only.
Can you please tell what is the RHS value in both the assign in the loop so that i do not make any mistake in execution.
Also one doubt, when we are performing String.join on dtOldSet using “#” placeholder as visible below, this value is already present in datatable dtOldSet as a part of text in some columns like “Reason of non compliance” which we are joining. So in this case do we need to change the placeholder to some other symbol or we are good with this “#” only ?
we just created a longer string for the many col compair for identifying same rows on a column set level. When the already present # will confuse the identities / unique key then feel free to change it to any other string / char
Kindly note:
just run the imp in run mode and not in slowed down debug mode
do not add any other things like log messages… as we are focusing on performance optimization
with adding .Take(length of part segment as integer) you can run smaller set
dtOldSet.AsEnumerable.Take(500) - running only the first 500 rows
I checked and found out that I did silly mistake of adding the required column in different datatable. I will fix that and run then let you know for the results.