LINQ query running for long hours

ppr · August 24, 2022, 11:47am

Let us introduce the next changes

we are creating lists with the concatenated strings from dtold, dtnew
and creating a dictionary of only the items common in both lists

dtNewSet.asEnumerable.Select(Function (x) String.Join(“#”, arrColSet.Select(Function (k) x(k).toString.ToLower.Trim))).Distinct().toList
dtOldSet.asEnumerable.Select(Function (x) String.Join(“#”, arrColSet.Select(Function (k) x(k).toString.ToLower.Trim))).Distinct().toList
ListDT1.Intersect(ListDT2).ToDictionary(Function (x) x, Function (x) true)

then we simplified the processing by keeping the ja string
and using the containskey result for the column value update.
(it is writing now “True”, “False” instead of yes, no (we are interested in speed currently)

Important to know about the intersect, it is is deduplicating as well. This needs to be respected when we need to keep it as well ( find matches)
grafik

Summary of the last optmization actions:

reduce the checklist
shift to dictionary containsKey as we want to check if this will be faster instead of list.contains

Result: Reducing the execution time from 24+ hrs to 10-20 secs

Post edit PREVIEW

General Analysis Pattern for Performance Optimization

When the execution time is to optimize following general actions can be applied:

Cleansing

remove any unneeded actions e.g. Log messages / Write Line… from the core block, which needs to be optimized on its execution

Measurement

surround the core block with a Stopwatch and trace the execution time
run the core block and cancel it, when the execution is taking too long time

Data Volume Reduction

Apply techniques like using take(x), reduced Test Data Set to retrieve an initial understanding of the core block execution timings and interpolate it to the full data volume

Parallelization

check and enable the parallel execution of the core block if possible. Maybe the core block has to be rewritten or to be modified

Isolation

Isolate the time consuming parts within the core block

Optimization

replace the time consuming parts with alternates and run again test series

Finalization

Once the optimized format / core block implementation is found run and measure it on the entire data volume

Topic		Replies	Views
LINQ query is taking a lot of time to execute Help excel , studio	8	1603	October 16, 2019
Linq Query optimisation Help datatable , excel , activities	3	809	November 11, 2020
Linq query taking too long Help excel , activities , question	12	1785	November 27, 2020
Comparing two big data Studio studio , question , tools	3	545	May 8, 2023
Code taking long time to execute Studio studio , question , activities_panel	2	1013	September 22, 2022