Processing multiple records in Excel parallely

Hello,

I have 400 odd records in Excel. each row takes about 2-3 minutes to process. I have 5 bots in orchestrator. How can i make sure my bots everytime picks unique records and as soon as they are finished they are moving to next row in excel.

Depends on your requirements. If writing back to Excel is a requirement, then obviously parallelism wouldn’t work. Check out Queues and Transactions first. Let’s say you need to fetch data from Excel, then - per row - visit a web page and provide data, scrape some more data and finally write the result back to Excel, here’s what I would recommend.

  1. Have a single robot read the Excel, generating Transaction item in a queue.
  2. Have multiple robots execute the items in the queue
  3. At the end, have a single robot turning all processed items into the Excel again.
1 Like

I’m trying to get better with queues myself, but I have some questions here.
—When you say have a single robot to generate the queue, are you implying to have a separate job that runs or is this built in with the main job of the process? I probably need to look over Queues some more so I can understand this part better, on how it adds the queues and then processes them within the same job.
—When you say to have a single robot export the items to Excel, I have to disagree slightly. And, this goes with my first question as well… you don’t want to have another job that needs to run. So, I feel like it might be better to wait for the Excel file to not be in use before updating, and would allow an end user to see the status during the run. Unless there is way to have the Excel update done as part of the job and have it run throughout the job run on the same robot while queues are being processed.

@rohit_nirantar I agree, look into Queues. Also, look over the ReFramework which is a Template in Studio, and see how they implement queues. The alternative would be to split your data set up and run it on multiple robots, which requires more effort I think.

Regards.

Apologies, I should have used a more precise wording. I am working with products from different vendors, too - and they define robots entirely different (i.e. a workflow is a robot, and not an individual machine or executing entity).

I think that depends on the task, but divide and conquer remains key. For the sake of precision, let’s define process as either a sequence, flow chart or state machine - something that you would create in studio, and something that is executed by a schedule on a robot backed by at least one physical machine. Usually, I like to follow the following procedure:

  • A sequence is preparing the work, i.e. creating transaction items - and just that.
  • Another sequence is responsible for doing all the heavy lifting. This sequence takes the next item in queue, unless the queue is empty.

Naturally, you could use parallelism in either case if required. So, here’s an example - imagine I want to fetch all custom components created for UiPath from Go! and most of its metadata, such as published date, updated date, download count, and more:

  • My first sequence opens the browser, extracts each URL, and adds it to a queue. It will visit all pages, and then terminate after the last item was extracted.
  • The second sequence would take the next available transaction item, browse to this precise url (example: https://go.uipath.com/component/firebase-connector), extracting all data required, and let’s say insert them into a table in SQL Server. Note that UiPath inherently makes sure the same URL - or the same transaction item - is processed by only once (unless it failed and needs to be retried).

Could you do all of this in a single sequence, or in two sequences invoked by the same flow chart? Sure, but is that approach better? Imagine you’d need to make some changes to the process - data now needs to be posted to a RESTful service instead of inserting it into SQL Server. The first sequence wouldn’t be affected, as preparing transactions hasn’t changed.

When you say to have a single robot export the items to Excel, I have to disagree slightly. And, this goes with my first question as well… you don’t want to have another job that needs to run. So, I feel like it might be better to wait for the Excel file to not be in use before updating, and would allow an end user to see the status during the run.

Agreed, yet this depends on the precise requirements. If someone needs to receive a daily report, then why not have a dedicated sequence doing so? We could even store any kind of data in a database, and then populate an Excel file before sending it via email. If they want have a robot assisting a human by filling parts of the same Excel file they’re currently working on, that wouldn’t be the ideal solution then.

1 Like

@redlynx82 and @ClaytonM thanks for your reply. But in my case i need to read excel row, perform some actions and write some details back to excel.

Can i use queues and transactions for this scenario?