Hi @ppr
Apologies for late response.
Surely, I will share the xaml file shortly so as you can also have a look on that and I will also run the parallel for each loop empty to check on the total time it took to complete.
Thanks!!
Hi @ppr
Apologies for late response.
Surely, I will share the xaml file shortly so as you can also have a look on that and I will also run the parallel for each loop empty to check on the total time it took to complete.
Thanks!!
Hi @ppr
Below is the xaml file
MatchData.xaml (13.3 KB)
Note - I have used package UiPathTeam.XLExcel.Activites package for reading extra large excel files
Please suggest! Also let me know is you need something else.
I have also started testing that empty loop approach on my system.
Thanks
Perfect, the xaml is minimally modeled.
now go into the different rounds as mentioned above:
Round 1 - do nothing, only loop:
Round 2 - loop and set a hardcoded value
The intention of this is to get a feeling of the general performance and to isolate the low performance action (sure we do feel that the match check will get more attraction in the next rounds)
Let us know the result. Feel free to do a surrounding with Stop watch:
myStopWatch.Elapsed.ToString("hh\:mm\:ss\.fff")
Hi @ppr
I worked on the things you mentioned
Both rounds were fairly quick .
Round 1 - Empty parallel for each loop
Result -
Round 2 - Hardcoded value
Result
Please suggest further.
Thanks !!
Perfect, now lets explore the performance break:
(using the original implementation from above e.g. your XAML)
Round 1 -
dtOldSet.AsEnumerable.Take(1)
Round 2 -
dtOldSet.AsEnumerable.Take(2)
Round 3 -
dtOldSet.AsEnumerable.Take(25)
We would like to get a feeling of the parallelization effort and the behaviour of the contains check
Again let us know the timings from the stopwatch.elapsed β¦ log
Hi @ppr
I ran the other rounds as u suggested below are the results
Round 1 - dtOldSet.AsEnumerable.Take(1)
Round 2 -dtOldSet.AsEnumerable.Take(2)
Round 3 -dtOldSet.AsEnumerable.Take(25)
Thanks!!
it is confusing why the fill job /including the contains check and update data column
is faster as the do nothing / update hard coded value
But now lets increase the segments to take 100,250,500,1000,5000 β¦
and explore
BTW: the execution Bot machine is it well equipped (e.g. RAM, Processor) and can it be the case that other processes are running and consuming power?
Hi @ppr
I Increased the segments, below are the results
Round 1 - For 100
Round 2 - For 250, please ignore the value 416125
Round 3 - For 500
Round 4 - for 1000
Round 5 - for 5000
While I run the bot in full fledged mode I close other processes so main things consuming major memory/RAM and running are UiPath, robot only
System specs - Processor - Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz 2.59 GHz
RAM - 16 GB
Thanks!!
Perfect estimatiing the 5000 set to a 400.000 set we would come to numbers below of 2-3 hrs, but taking some time as well
We identified that the contains check is more to attack for the tuning.
Do we have duplicates within the dtNewSet? Do we have entries within the dtNewSet which do not have a match in dtOldSet? What is a typical/average string length when all 8 cols are concatenated?
For preparing alternates for the match check let us also check following (surrounding with the stop watch and its measuring is helping much and is recommended).
For starting an exploration for alternates also check how long it takes to create a lookup Dictionary
dictNewSetLK | Dictionary (of String, Boolean) =
dtNewSet.asEnumerable.Select(Function (x) String.Join(β#β, arrColSet.Select(Function (k) x(k).toString.ToLower.Trim))).Distinct().toDictionary(Function (x) x, Function (x) True)
The idea about is get some faster checks when using the dictionary.ContainsKey vs List.Contains
Hi @ppr
Please find below response -
Also I have 1 question on below
does the above query needs to be in parallel loop or just you require the stopwatch execution time of this query when used in a single assign variable?
Thanks
Hi @ppr
I tried the above code in an assign activity, below is the result
Code ran -
Also, the answer to these below questions you asked are provided in the above post. Post before this one.
Thanks!!!
yes, we are working on it. Just give us a little time. Thanks
Yes Sure.
Thanks a lot for researching on this and helping me.
Really appreciate.
Thanks
Let us introduce the next changes
we are creating lists with the concatenated strings from dtold, dtnew
and creating a dictionary of only the items common in both lists
dtNewSet.asEnumerable.Select(Function (x) String.Join(β#β, arrColSet.Select(Function (k) x(k).toString.ToLower.Trim))).Distinct().toList
dtOldSet.asEnumerable.Select(Function (x) String.Join(β#β, arrColSet.Select(Function (k) x(k).toString.ToLower.Trim))).Distinct().toList
ListDT1.Intersect(ListDT2).ToDictionary(Function (x) x, Function (x) true)
then we simplified the processing by keeping the ja string
and using the containskey result for the column value update.
(it is writing now βTrueβ, βFalseβ instead of yes, no (we are interested in speed currently)
Important to know about the intersect, it is is deduplicating as well. This needs to be respected when we need to keep it as well ( find matches)
Summary of the last optmization actions:
Result: Reducing the execution time from 24+ hrs to 10-20 secs
Post edit PREVIEW
When the execution time is to optimize following general actions can be applied:
Hi @ppr
Thanks a lot for your efforts.
I have few questions.
Currently the bot the logic is writing True/False, how can we change to Yes/No in the code please suggest.
We are running a parallel for each which runs asynchronously and not in a sequential
manner, will the results be the same and correct which we might obtain from the LINQ query?
Also, if we need to take some more steps forward on implementation part please guide.
Thanks!!
Let us first summarize - we progressed and reduced from 24+hrs to seconds, right?
we are removing duplicates within the checklist
we are not removing rows from dtOldSet
you can also check the rows count dtOldSet.Rows.Count after the run
As we progressed we can maybe invest some miliseconds for the yes / no
If(dictLK.ContainsKey(ja),"yes", "no")
For answering the question in detail we are forced to look at few technical things also behind the scenes. We are expecting that the parallel for each will work as intended.
If you will find a inconsistency then share it also with the UiPath Techteam for a replication and analysis round.
Can we finalize the topic?
Hi @ppr
Thanks for the clarification.
On Summary we are able to make process efficient from hours to minutes.
Big thanks for working on the issue and helping on it.
I will let you know if I face any issues in this processing logic wise or on any other issues when I verify the final results.
Thanks!!