Parallel for each limit

There are situations where launching same job in parallel in 10 processes speeds up things, but launching in 20 parallel processes will go slower than one by one.

So, it would be useful to have a property to the Parallel For Each activity to limit to a number maximum processes, having in this way an automatic slicing of the work to be done to smaller batches.

Hey Ciprian,

I agree, this would be useful. In the interim, I was wondering if having a multidimensional collection would work? So you have an array of strings for instance that you’d like to go through, then you could cut it up into a number of arrays that you want to be processed in parallel and then feed an array of those arrays to the Parallel For Each. You’d then have to have an inner For Each that’s not parallel, but it may get the job done. That workaround is a lot of effort though and your suggestion would be far more efficient.

1 Like

I have already worked this way (other technology, other times) checking how many CPUs are free and dividing the queue in smaller queues to go in parallel to occupy all CPUs, but it is additional work and it could be made more simple with the limit proposed :slight_smile:

1 Like

Hi,
Thank you for your suggestion. I added it to our internal ideas tracker for our team to consider.

1 Like

Hey sorry to resurrect the thread :slight_smile: but any news on this topic? I have been having some issues with Data Scraping inside a Parallel For each that then uses a For each row (non parallel) inside it. Somehow the data gets overwriten between the different threads and the CurrentRow or even Index do not have the correct values.

So my temporary solution would be to merge all the scraped datatables and run a parallel for each row but since I will have probably more than 200 rows and each one uses a Chrome tab things will definitely not go smoothly, unless I can setup the limit Ciprian was suggesting.

Managed to fix it by using a while loop and assigning a variable before it with the extracted DT’s total rows that I used to check in the loop condition. This way the different parallel threads did not confuse the index and totalrows values between them like they did before with other types of variables.