There is no need to try to avoid For Each loops on large amounts of data

I’ve seen the discussion many times about avoiding For Each (Row) because looping through a lot of rows is inefficient.

So I wrote a test. It creates a datatable of however many rows we want (Name, Age, City). Default values are Mary, 28, Dallas. Every 10th row is Tom, 27, Miami. Goal is to update Tom to Boston.

The first method is simply a single For Each with nested Assign to change Tom’s City to Boston.

This is the method of filtering into two separate DTs so you have one DT that’s just Tom’s rows, then remove Tom from the main dt, then For Each Row to update Miami to Boston, then merge. This method only has to For Each Row through 10% of the rows.

I had to go to 1 million rows before any meaningful processing time difference appeared. Here is the result of 1 million rows:

image

Here is 5 million rows:

image

The filter method is significantly slower, as the first filter has to process all 5 million rows, then you update 500,000 rows, then merge 5 million rows.

Also, the speed at which a single For Each Row is processing millions of rows should indicate all this concern over finding other more efficient methods is pointless. For Each Row is very fast.

3 Likes

Hi @postwick,

Standard UiPath activities are quick enough for most automations. There probably is a linq query in the code which UiPath uses as well. So to see many people avoid using standard loop activities to me is something which I cannot understand.

It would be great if you can share your .xaml files and we could request @Yoichi or @ppr to help us find how much faster linq queries can be on the same data.

Null Hypothesis : Linq query is 25% faster
Alternative Hypothesis : There is negligible difference

1 Like

It’s just Build Datatable with 3 columns: Name (str), Age (int32), City (str)

Then Repeat activity to populate with however many rows you want.

Then they can try the Linq query, I’d be interested in seeing how that does. Of course, we’d have to keep in mind differences in the computers we are using. If someone could just tell me the Linq query, I’ll add it to my tests and get the numbers.

Agreed based on the system resources this test results will differ. Lets hope someone who is good at linq can help with an optimized query so that we can test this out objectively.

But just for the sake of completeness, you could add the .xaml file, people who want to try it will save some time setting it up.

Thank you for starting a discussion on this topic.

Here’s the XAML for anyone who wants it.

LoopTesting.xaml (20.7 KB)

@postwick thanks for shifting this discussion topic to facts
@jeevith thanks for your involvment

I tried to avoid ongoing discussions from the past as the problem is not LINQ, For Each Defects or dramatic performance issues. It was more about misstaking Learning topics, mixing up LINQ benefits along with no present defects of other options. Also it was ignoring the nature and misuse of LINQ.

However as there are some topics added to the Blog/Tutorial pipeline, a preview already was released (LINQ vs Loops like for, for each and while - #2 by ppr), so its ok to step in within this discussion as an made exception.

First of all: Nothing is wrong, bad or else using For each … Activity

#Case 1
grafik

first of all it will not update every fith row:
grafik

it will update every record where current item last digit is a 5

Updating every five rows will be done with a mod check
grafik
grafik

Doing task 1 with a LINQcould look like below. Kindly note: LINQ Suggestion is one of many option

ra1 = {"Mary", 28, Dallas}
ra2 = {"Tom",27, Miami}

dtData =
(From x in Enumerable.Range(1,1000000)
let ra = If(x Mod 5 = 0, ra2, ra1)
Select r=dtData.Rows.Add(ra)).CopyToDataTable

Here we only use CopyToDataTable for psychological reasons and it is not needed

Case 2:
grafik

is similar to for each row, filter datatable activities (losing reference to original datatable etc)
approaches, but can be avoided by following:

  • feed with LINQ the already filtered rows into a for each (and do not use for each row)
  • update City
    grafik

And thats all, just move only the needed rows

Case 3:
is done with Case 2 and a split and merge datatable can be avoided

1 Like

Sorry yes I fixed the “every 5th” in another post and forgot to fix it here. It’s every 10th. It doesn’t need to be every 5th, it could be any ratio.

thats absolutely fine and will not bring out an oposite result.
But when we do the discussions we would do it with matching code and description

Cross References

1 Like

The cross references are good, although the second one is where I first posted the above tests.

The point to my test is performance related. The test proves For Each Row is not slower than the method proposed to avoid it. It’s much faster. And in either event, For Each Row can process 5 million rows in a matter of seconds so worrying about performance is pointless.

maybe you reread the post and links again and will pause a while

I didn’t disagree with you. As your references point out, there are other considerations besides performance when deciding how to achieve a goal. My test was simply looking at processing time for a specific example.

Hi @ppr ,

Thank you for taking time to give your inputs on this test. I have not read all the linked reference posts yet, but will do in the coming days.

I completely agree with you here about avoiding magical blackboxes. Not everyone can be a magician in other words if a complicated Linq query does great with robot execution time but when the code has to be maintained by someone who does not understand the blackbox, it can be quite a challenge and can lead to downtime.

https://forum.uipath.com/t/linq-vs-loops-like-for-for-each-and-while/341344/2

  • avoid BlackBoxes :
    • a LINQ statement is Pass or Fail but if its failling, then we also need to find out what is going wrong.
    • Apply thechniques that will support you on issue analysis
    • Combine essential approaches with LINQ (e.g. Grouping data with LINQ, processing the group members with essential methods
1 Like

The only area where I’ve seen a performance issue with For Each is in Debug mode.

Example: Iterate over 1,000 integers.

  • Debug: 248 seconds
  • Release: 4 milliseconds

Something to be aware of for those who need to debug a workflow that loops over a large set.