Need design help: handling large excel with 200+ columns and 20k+ rows


I have a large excel file to process having 200 columns and 20k+ rows. Out of the 200 columns, two are significant for the automation process, viz - i.the primary key column(eg- emp id) and ii.result column (initially blank, to be updated by the bot with results like ‘Processed’ or ‘Exception’. Once all the rows (all employee id data) are processed by the bot the end users expect to open the file to find out the result of the process for each id and the other columns should be there for manual reference thereof.

Can you please suggest how it should be designed. For example, should we break the entire excel into smaller ones->read rows within Excel application scope->Use for each->Update datatable with result and exception->finally write the dT into Result file
Create a queue->Upload path of each excel file->let multiple bot handle each excel file
Create a queue->Upload all the emp id info->have multiple bot execute the process in parallel (then how to control execution time or how to gather all results and put into the original file so that the end user have everything in one file)

I am just thinking out loud, the three approaches may not be good enough or limited in nature. Please suggest.

Thanks, Sourav.

Hi @humanity2003,

Welcome back to the community!!

If I understood right out of 200 columns you have to work only on two. There can be many approaches for that.
One quick one which just rolled in my mind

One can be Read the file using excel application scope → filter only the two columns which need to be worked upon say like only take those two columns and remove from the DT → then work on those two columns(you can do it both by loop or LINQ)-> post working delete the entire columns from the excel sheet either by VB or invoke Code or invoke method → then add the columns with data in the excel using invoke method and set the position using set ordinal method or Do it using VB.


1 Like

Hi Shikhar,

Thanks for such prompt reply. :slight_smile:
It is a simple approach but hugely effective.

A few questions:

  1. Imagine each id requires 2 minutes to be processed, then for 20K+ records, the time required would be huge. I would have to engage the bots for quite a long time.
    And I am not sure if within this time there is a catastrophic failure (like network unavailability or Excel exceptions or application issue), how we can make sure that the data already processed are not lost.

  2. Do you think somehow we can use Queue to simplify and control the execution. I am just learning RE framework and was thinking if using queue would be a good approach.

Thanks a lot! :+1: