I’ve seen the discussion many times about avoiding For Each (Row) because looping through a lot of rows is inefficient.
So I wrote a test. It creates a datatable of however many rows we want (Name, Age, City). Default values are Mary, 28, Dallas. Every 10th row is Tom, 27, Miami. Goal is to update Tom to Boston.
The first method is simply a single For Each with nested Assign to change Tom’s City to Boston.
This is the method of filtering into two separate DTs so you have one DT that’s just Tom’s rows, then remove Tom from the main dt, then For Each Row to update Miami to Boston, then merge. This method only has to For Each Row through 10% of the rows.
I had to go to 1 million rows before any meaningful processing time difference appeared. Here is the result of 1 million rows:
Here is 5 million rows:
The filter method is significantly slower, as the first filter has to process all 5 million rows, then you update 500,000 rows, then merge 5 million rows.
Also, the speed at which a single For Each Row is processing millions of rows should indicate all this concern over finding other more efficient methods is pointless. For Each Row is very fast.