Is the duration of the Excel Read Range activity linear?
The short answer is “yes”, I think that it is.
What I did
I’ve been wondering this for a long time, and I finally sat down today to test it. I made an excel sheet with 20 columns and 500k rows of random numbers (It’s too big to attach to this forum post) and wrote some logic that will read the first n lines in that big excel document and time how long the whole thing takes (in milliseconds).
I used a while loop to test larger and larger ranges read from the excel sheet, and the loop terminated when the duration of the most recent read range exceeded some value. The highest value I tried was 300e3 milliseconds or 5 minutes. I would have done more, but I didn’t .
Results
While I haven’t tried any regressions on the values I recorded, just looking at the plot they make, I think that we reasonably say that the duration of the read range increases linearly with the number of rows read, at least in the small range that I tested.
I’m sure that if you tried this on your machine, you would see different numbers based on the specs of your machine and the nature of the excel file you were working with, but I would guess that you would also see a linear relationship.
n_rows | times |
---|---|
10 | 12057.5088 |
20 | 14174.8974 |
30 | 13452.7809 |
40 | 14294.506 |
50 | 13944.5653 |
60 | 14537.177 |
70 | 15109.347 |
80 | 15833.2845 |
90 | 16208.639 |
100 | 16851.1495 |
110 | 18917.3468 |
121 | 22090.1927 |
133 | 23154.3922 |
146 | 23774.9967 |
161 | 24708.841 |
177 | 25784.5947 |
195 | 26776.4849 |
215 | 28342.3057 |
237 | 30301.6545 |
261 | 34794.3974 |
287 | 32041.9359 |
316 | 39649.3006 |
348 | 36851.3344 |
383 | 41491.15 |
421 | 42827.7526 |
463 | 45265.4795 |
509 | 48839.7059 |
560 | 52621.2287 |
616 | 44918.0589 |
678 | 50116.3744 |
746 | 64826.3361 |
10 | 822.4768 |
50 | 2829.9788 |
90 | 4881.1144 |
130 | 7231.2023 |
214 | 12792.2468 |
353 | 20315.2468 |
582 | 35465.6658 |
960 | 59352.0499 |
1584 | 97641.3646 |
2614 | 139688.0574 |
10 | 912.3637 |
50 | 2696.1846 |
90 | 4801.7623 |
130 | 7425.0385 |
170 | 9025.4188 |
280 | 16144.509 |
462 | 27297.1799 |
762 | 47783.9471 |
1257 | 78220.7533 |
2074 | 115583.8565 |
3422 | 211907.7464 |
5646 | 348370.7987 |
The project folder
You’ll have to find your own suitably large excel file to use, but if you don’t want to write the logic yourself, you can try to use my project. Fair warning, I’m just outputting the results as log messages and parsing the output.txt file you can create from studio to get results.
LargeExcelDiagnostics.zip (3.9 KB)
Why didn’t you try X?
Man, I have a bunch of questions after this experiment like
- Is the read time still linear if the workbook has formulas instead of static number values?
- Is the relationship still linear when you get in the 10k-50k row range?
- Does the number of columns matter? I only tried all 20 columns today.
I just put this together because it’s the kind of post I went looking for about a year ago and couldn’t find. It might be cool to improve the project (like making the output more friendly or doing the calcs with UiPath excel activities instead of python), but down that path lies madness. What I presented here is just the result of me spending about half of a day exploring a UiPath question I couldn’t find the answer to.
Maybe I’ll try to do X in my next attempt to waste time productively.