Iterating through rows of an Excel table without storing the table itself

Hi, I have an Excel table of 18 columns and I want to iterate through the table row by row in ReFramework.

I want to do this without selecting the entire thing as a huge DataTable and iterating through the whole thing in memory. This is to limit how much RAM is used in the background while the process is running.

I am currently using Excel Read Row, but this returns an iEnumerable instead of a DataRow. This iEnumerable does not contain information on what each column means. I would rather not hardcode what each column title is because the workflow should be flexible in deciding which column title is in which column index.

How do I keep the Excel worksheet open and only read one row at a time, while preserving the information on column titles?

Hello @DEATHFISH

You can use queues you can process the items you want.

Read excel… Update to queue then u can retrieve the values as a data row.

Hi, this workflow is supposed to be done on the local machine without Orchestrator, how do I achieve this? Thanks

  1. Working on one row at a time
  2. Preserving column information
  3. No hardcoding which column id/index refers to which column name

@DEATHFISH

  1. use Read Range Activity to read the data and will give you output as dataTable.

  2. Then you can read one by one row like this:

System.Data.DataRow dr = YourDT(index)

Where index is of type interger and initialize with 0. And then increment index value by 1 to process next DataRow.

It’s better to use REFramework Template to work with this kind of input data.

Hi @lakshman

My concern is that in earlier versions of UiPath I have encountered error while reading large amounts of data in a DataTable and procesing it in memory. For large datasets, the workflow will hang halfway through due to RAM allocation.

Thus I would preferably like to persist the Excel data table within the application on the screen, and use the Excel application on the screen to read the data rows one by one.

The problem I am now facing is that if I read the rows one by one, I have difficulty storing the column names and data types (within each row) which are from the Excel data table itself



I’ve had an idea that I read only the first row with ‘Add headers’ so I get an empty table with the schema, and then use Transaction Number to read each Excel row (Read Row activity). This would produce the row data as an iEnumerable, but I don’t know how to associate this iEnumerable with the empty table schema I’ve created earlier. Any suggestions on how to approach this?

@DEATHFISH
Each data row is passed to process transaction from GetTransaction as TransactionItem. So firstly, you need to change it’s type to DataRow from argument or variable panel.
If still the error persists, then where transactionItem is initialised to ‘Nothing’, delete it and retype it. (Not sure why this happens but it seem to happen only if changing the TransactionItem variable data type to datarow after the assign activity has been set)

Now as you are getting the data row you can compare it with specific columns with the index or column name, say, ‘dr’ is your datarow, then you can use dr[“ColumnName”] or dr.row.Item(“columnIndex”).

Hope this helps!

@Anmol_Yadav

I use Excel Read Row activity to read individual rows one by one but all I get is an iEnumerable. The information regarding columns and data types are gone. How do I convert this iEnumerable to a DataRow?

Also, is there any alternative method to retrieve DataRows one by one without reading the entire DataTable?

The variable in which you are passing the data to next state, change it’s type to Datarow.
You can use queue as an alternative approach, or a for each, but there is no need to use for each if you using ReFramework.

If I assign its type to DataRow I get Option Strict On disallows implicit conversions to DataRow

If you absolutely need to keep it as a datatable what you can do is:

  • Build an empty datatable (dtA) with the columns (schema) you need. This should be a “global” variable (as in, the scope should be such that you can pass it around your workflows without much issue). You can do this, or alternatively, read the datatable (read range) but only one row of it
  • Get the total number of rows in your excel file
  • Use Read Range with “add headers” disabled to read one row (dtB). Keep the scope very limited
  • Transfer your original datatable (dtA) schema to the new one (dtB): Column names, data types (if they didn’t match), null values.

You now have a 1 row datatable with the schema you needed.

Keep in mind, as far as memory usage goes, you’re still at the mercy of the garbage collector. It should work fine as long as you keep your scope limited.