[Benchmarking] - Filtering a datatable using native approaches in UiPath Studio

This post describes the datatable filtering benchmark using four native datatable filtering approaches in UiPath Studio. The aim of this post is to improve upon a previously reported benchmarking exercise.

Input datatable generation

Datatables with column count of 13 and row count of 10000, 40000, 160000, 640000, 2560000, 5120000, 10240000 were generated using the Faker library in python.

Following are two sample rows from a generated table.

job company ssn residence current_location blood_group website username name sex address mail birthdate
Secondary school teacher Bradley, Nunez and Powell 311-70-5289 91561 Kim Burg Suite 437 New Joyceton, KS 93417 (-61.1402965,-117.434753) B+ http://www.gonzalez.biz/ williampetersen Danielle Smith F 451 Hahn Track Apt. 835 North Kenneth, WY 17314 [csolomon@gmail.com (mailto:csolomon@gmail.com) 1913-12-29
Midwife Park Inc 064-08-7579 216 Bradley Ports Suite 342 Sydneyport, AZ 54808 (78.6754295, -125.981205) A- https://www.smith.info/ https://cantu.com/ http://www.sims.com/ zjones Tracy Reed F 002 Jennifer Mall Cynthiamouth, UT 67079 [davidstone@hotmail.com (mailto:davidstone@hotmail.com) 1963-07-12

Filtering definition

As reported in the previous tests, the filter condition used in all approaches translates to the same logic, i.e., return rows where the input table column job has the value Social worker. In simple SQL syntax this translates to:

SELECT * FROM  TESTDATATABLE
WHERE job='Social worker';
  • The Start and Stop timer activities were used to time only the unit/operation under test (the filtering operation).
    Visually each filtering approach has a dedicated workflow and iterates through the different data sizes as shown below.

        flowchart LR
        
            subgraph For Each Data Variation
           PreTestActivities[Pre test activities] -.- StartTimer
            StartTimer[Start timer]--> FilterOperation(Filtering approach)         
           FilterOperation-->StopTimer[Stop timer]   
        StopTimer -.- PostTestActivities[Post test activities]
            end
    
  • To avoid any debugging lags, the benchmark was executed using the UiPath UiRobot.exe and not run within Studio.

  • To test repeatability, 3 runs for all input data variations were executed.

  • The latest UiPath official activities (dependencies) were used (24.10.x).

Hardware specifications

To avoid any performance limitations imposed by the operating system, the filtering benchmark was run on a laptop when connected to the power outlet.

Specification Value
Name Intel(R) Core™ i7-8850H CPU @ 2.60GHz
NumberOfCores 6
NumberOfLogicalProcessors 12
Architecture x64
CurrentClockSpeed 2592
Availability RunningOrFullPower
CsPhysicallyInstalledMemory 31.79 GB
RAM Speed 3200 Mhz
OsName Microsoft Windows 11 Pro

Observations

The observations from three runs were post-processed in python to create visualizations of filter execution time in seconds and potential gains in percentages.

The potential gains chart shows the potential gains possible when comparing the bottom ranked approach with the top ranked approach for the given run and input data variation.

The charts and the corresponding observed data are as follows.

First run

Benchmarking table
Tool RowCount FilterExecutionTimeSeconds TotalETLExecutionTimeSeconds RowsKept Rank
Linq 10000 0.0033031 0.378116 16 1
FilterDataTable 10000 0.0276463 0.401704 16 2
Select 10000 0.0295979 0.424341 16 3
ForEachRow 10000 0.11087 0.47255 16 4
Linq 40000 0.0047663 0.413629 71 1
FilterDataTable 40000 0.0280976 0.339016 71 2
Select 40000 0.0450852 0.395191 71 3
ForEachRow 40000 0.0877472 0.401331 71 4
Linq 160000 0.0183264 1.64421 267 1
FilterDataTable 160000 0.0833674 1.78958 267 2
Select 160000 0.226707 1.85155 267 3
ForEachRow 160000 1.68622 3.39573 267 4
Linq 640000 0.0818976 5.47745 941 1
FilterDataTable 640000 0.341406 5.8552 941 2
Select 640000 0.973144 8.11462 941 3
ForEachRow 640000 1.19504 6.74479 941 4
Linq 2560000 0.314549 27.3203 4073 1
FilterDataTable 2560000 1.43852 30.0722 4073 2
Select 2560000 4.6206 32.5585 4073 3
ForEachRow 2560000 5.06532 29.1012 4073 4
Linq 5120000 0.622156 49.2164 7906 1
FilterDataTable 5120000 2.93253 52.199 7906 2
ForEachRow 5120000 9.15946 58.1783 7906 3
Select 5120000 10.1409 57.193 7906 4
Linq 10240000 1.38597 109.236 16320 1
FilterDataTable 10240000 6.34079 110.518 16320 2
ForEachRow 10240000 19.1895 123.934 16320 3
Select 10240000 24.9072 128.171 16320 4

Potential gains

Run1Chart_Gains

Potential gains table
RowCount max min PotentialGainsinPercent
0 10000 0.11087 0.0033031 97.0207
1 40000 0.0877472 0.0047663 94.5681
2 160000 1.68622 0.0183264 98.9132
3 640000 1.19504 0.0818976 93.1469
4 2560000 5.06532 0.314549 93.7901
5 5120000 10.1409 0.622156 93.8649
6 110240000 24.9072 1.38597 94.4355

Second run

Benchmarking table
Tool RowCount FilterExecutionTimeSeconds TotalETLExecutionTimeSeconds RowsKept Rank
Linq 10000 0.0031322 0.352156 16 1
Select 10000 0.0128726 0.387855 16 2
FilterDataTable 10000 0.0254493 0.357849 16 3
ForEachRow 10000 0.076267 0.406852 16 4
Linq 40000 0.0051803 0.319606 71 1
FilterDataTable 40000 0.0284668 0.364492 71 2
Select 40000 0.0443647 0.378848 71 3
ForEachRow 40000 0.0814616 0.406687 71 4
Linq 160000 0.019078 1.65689 267 1
FilterDataTable 160000 0.0852833 1.80503 267 2
Select 160000 0.22467 1.90338 267 3
ForEachRow 160000 1.6825 3.27962 267 4
Linq 640000 0.0866304 5.42237 941 1
FilterDataTable 640000 0.333395 5.84891 941 2
Select 640000 0.951289 6.18091 941 3
ForEachRow 640000 1.18859 6.36936 941 4
Linq 2560000 0.330386 26.4394 4073 1
FilterDataTable 2560000 1.41783 30.1093 4073 2
Select 2560000 4.5383 32.9299 4073 3
ForEachRow 2560000 4.77842 28.4826 4073 4
Linq 5120000 0.669727 47.9845 7906 1
FilterDataTable 5120000 2.86944 50.2407 7906 2
ForEachRow 5120000 9.56838 56.9043 7906 3
Select 5120000 10.1764 56.6302 7906 4
Linq 10240000 1.42513 103.732 16320 1
FilterDataTable 10240000 6.20137 109.485 16320 2
ForEachRow 10240000 19.2645 121.815 16320 3
Select 10240000 25.0563 128.818 16320 4

Potential gains

Run2Chart_Gains

Potential gains table
RowCount max min PotentialGainsinPercent
0 10000 0.076267 0.0031322 95.8931
1 40000 0.0814616 0.0051803 93.6408
2 160000 1.6825 0.019078 98.8661
3 640000 1.18859 0.0866304 92.7115
4 2560000 4.77842 0.330386 93.0859
5 5120000 10.1764 0.669727 93.4188
6 10240000 25.0563 1.42513 94.3123

Third run

Benchmarking table
Tool RowCount FilterExecutionTimeSeconds TotalETLExecutionTimeSeconds RowsKept Rank
Linq 10000 0.0029612 0.381278 16 1
Select 10000 0.0123129 0.387587 16 2
FilterDataTable 10000 0.0276669 0.38571 16 3
ForEachRow 10000 0.0836655 0.429397 16 4
Linq 40000 0.0052128 0.345098 71 1
FilterDataTable 40000 0.0285085 0.350228 71 2
Select 40000 0.0437795 0.418821 71 3
ForEachRow 40000 0.0743338 0.394631 71 4
Linq 160000 0.0184848 1.67329 267 1
FilterDataTable 160000 0.0942167 1.80083 267 2
Select 160000 0.232881 1.85961 267 3
ForEachRow 160000 1.64556 3.28754 267 4
Linq 640000 0.0877967 5.30925 941 1
FilterDataTable 640000 0.339033 5.87826 941 2
Select 640000 0.943874 6.25742 941 3
ForEachRow 640000 1.13419 6.38768 941 4
Linq 2560000 0.328358 26.3376 4073 1
FilterDataTable 2560000 1.4417 30.5573 4073 2
Select 2560000 4.57539 30.9571 4073 3
ForEachRow 2560000 4.67768 28.5096 4073 4
Linq 5120000 0.660485 49.2912 7906 1
FilterDataTable 5120000 2.87999 52.147 7906 2
ForEachRow 5120000 9.27352 58.0156 7906 3
Select 5120000 10.2306 56.7211 7906 4
Linq 10240000 1.49578 103.258 16320 1
FilterDataTable 10240000 6.34189 112.4 16320 2
ForEachRow 10240000 18.8852 121.139 16320 3
Select 10240000 25.3838 127.883 16320 4

Potential gains

Run3Chart_Gains

Potential gains table
RowCount max min PotentialGainsinPercent
0 10000 0.0836655 0.0029612 96.4607
1 40000 0.0743338 0.0052128 92.9873
2 160000 1.64556 0.0184848 98.8767
3 640000 1.13419 0.0877967 92.2591
4 2560000 4.67768 0.328358 92.9803
5 5120000 10.2306 0.660485 93.544
6 10240000 25.3838 1.49578 94.1073

References

Questions

For questions on your specific filtering case, kindly open a new topic and get individual support.

8 Likes

Great evaluation :clap:t2:

I also realized that the UiPath ForEachRow activity takes so long to filter DataTable, whereas the .NET ForEach statement should be very fast. The problem is complex filterings require only the ForEach activity.

Therefore, sometimes I filter using LINQ beforehand and then use ForEach for processing. Other times, I create a static method using UiPath Coded Automation and call the method within the LINQ for more complex scenarios. This can significantly reduce turnaround time and increase efficiency.

Hope to see more benchmarkings like this.

Regards,
Azeem

@azeem_rosli
Thanks for your feedback

we (@jeevith, @ppr) are working on it