Is the duration of the Excel Read Range activity linear?

Is the duration of the Excel Read Range activity linear?

The short answer is “yes”, I think that it is.

What I did

I’ve been wondering this for a long time, and I finally sat down today to test it. I made an excel sheet with 20 columns and 500k rows of random numbers (It’s too big to attach to this forum post) and wrote some logic that will read the first n lines in that big excel document and time how long the whole thing takes (in milliseconds).

I used a while loop to test larger and larger ranges read from the excel sheet, and the loop terminated when the duration of the most recent read range exceeded some value. The highest value I tried was 300e3 milliseconds or 5 minutes. I would have done more, but I didn’t :stuck_out_tongue: .

Results

image

While I haven’t tried any regressions on the values I recorded, just looking at the plot they make, I think that we reasonably say that the duration of the read range increases linearly with the number of rows read, at least in the small range that I tested.

I’m sure that if you tried this on your machine, you would see different numbers based on the specs of your machine and the nature of the excel file you were working with, but I would guess that you would also see a linear relationship.

n_rows times
10 12057.5088
20 14174.8974
30 13452.7809
40 14294.506
50 13944.5653
60 14537.177
70 15109.347
80 15833.2845
90 16208.639
100 16851.1495
110 18917.3468
121 22090.1927
133 23154.3922
146 23774.9967
161 24708.841
177 25784.5947
195 26776.4849
215 28342.3057
237 30301.6545
261 34794.3974
287 32041.9359
316 39649.3006
348 36851.3344
383 41491.15
421 42827.7526
463 45265.4795
509 48839.7059
560 52621.2287
616 44918.0589
678 50116.3744
746 64826.3361
10 822.4768
50 2829.9788
90 4881.1144
130 7231.2023
214 12792.2468
353 20315.2468
582 35465.6658
960 59352.0499
1584 97641.3646
2614 139688.0574
10 912.3637
50 2696.1846
90 4801.7623
130 7425.0385
170 9025.4188
280 16144.509
462 27297.1799
762 47783.9471
1257 78220.7533
2074 115583.8565
3422 211907.7464
5646 348370.7987

The project folder

You’ll have to find your own suitably large excel file to use, but if you don’t want to write the logic yourself, you can try to use my project. Fair warning, I’m just outputting the results as log messages and parsing the output.txt file you can create from studio to get results.

LargeExcelDiagnostics.zip (3.9 KB)

Why didn’t you try X?

Man, I have a bunch of questions after this experiment like

  • Is the read time still linear if the workbook has formulas instead of static number values?
  • Is the relationship still linear when you get in the 10k-50k row range?
  • Does the number of columns matter? I only tried all 20 columns today.

I just put this together because it’s the kind of post I went looking for about a year ago and couldn’t find. It might be cool to improve the project (like making the output more friendly or doing the calcs with UiPath excel activities instead of python), but down that path lies madness. What I presented here is just the result of me spending about half of a day exploring a UiPath question I couldn’t find the answer to.

Maybe I’ll try to do X in my next attempt to waste time productively.

1 Like