Iterating through rows of an Excel table without storing the table itself

DEATHFISH · July 22, 2019, 7:15am

Hi, I have an Excel table of 18 columns and I want to iterate through the table row by row in ReFramework.

I want to do this without selecting the entire thing as a huge DataTable and iterating through the whole thing in memory. This is to limit how much RAM is used in the background while the process is running.

I am currently using Excel Read Row, but this returns an iEnumerable instead of a DataRow. This iEnumerable does not contain information on what each column means. I would rather not hardcode what each column title is because the workflow should be flexible in deciding which column title is in which column index.

How do I keep the Excel worksheet open and only read one row at a time, while preserving the information on column titles?

Ajju · July 22, 2019, 7:30am

Hello @DEATHFISH

You can use queues you can process the items you want.

Read excel… Update to queue then u can retrieve the values as a data row.

DEATHFISH · July 22, 2019, 7:35am

Hi, this workflow is supposed to be done on the local machine without Orchestrator, how do I achieve this? Thanks

Working on one row at a time
Preserving column information
No hardcoding which column id/index refers to which column name

lakshman · July 22, 2019, 3:25pm

@DEATHFISH

use Read Range Activity to read the data and will give you output as dataTable.
Then you can read one by one row like this:

System.Data.DataRow dr = YourDT(index)

Where index is of type interger and initialize with 0. And then increment index value by 1 to process next DataRow.

It’s better to use REFramework Template to work with this kind of input data.

DEATHFISH · July 23, 2019, 2:31am

Hi @lakshman

My concern is that in earlier versions of UiPath I have encountered error while reading large amounts of data in a DataTable and procesing it in memory. For large datasets, the workflow will hang halfway through due to RAM allocation.

Thus I would preferably like to persist the Excel data table within the application on the screen, and use the Excel application on the screen to read the data rows one by one.

The problem I am now facing is that if I read the rows one by one, I have difficulty storing the column names and data types (within each row) which are from the Excel data table itself

…
I’ve had an idea that I read only the first row with ‘Add headers’ so I get an empty table with the schema, and then use Transaction Number to read each Excel row (Read Row activity). This would produce the row data as an iEnumerable, but I don’t know how to associate this iEnumerable with the empty table schema I’ve created earlier. Any suggestions on how to approach this?

Anmol_Yadav · July 23, 2019, 2:57am

@DEATHFISH
Each data row is passed to process transaction from GetTransaction as TransactionItem. So firstly, you need to change it’s type to DataRow from argument or variable panel.
If still the error persists, then where transactionItem is initialised to ‘Nothing’, delete it and retype it. (Not sure why this happens but it seem to happen only if changing the TransactionItem variable data type to datarow after the assign activity has been set)

Now as you are getting the data row you can compare it with specific columns with the index or column name, say, ‘dr’ is your datarow, then you can use dr[“ColumnName”] or dr.row.Item(“columnIndex”).

Hope this helps!

DEATHFISH · July 23, 2019, 3:05am

@Anmol_Yadav

I use Excel Read Row activity to read individual rows one by one but all I get is an iEnumerable. The information regarding columns and data types are gone. How do I convert this iEnumerable to a DataRow?

Also, is there any alternative method to retrieve DataRows one by one without reading the entire DataTable?

Anmol_Yadav · July 23, 2019, 3:47am

The variable in which you are passing the data to next state, change it’s type to Datarow.
You can use queue as an alternative approach, or a for each, but there is no need to use for each if you using ReFramework.

DEATHFISH · July 23, 2019, 4:03am

If I assign its type to DataRow I get Option Strict On disallows implicit conversions to DataRow

JCaravaca · July 23, 2019, 6:19am

If you absolutely need to keep it as a datatable what you can do is:

Build an empty datatable (dtA) with the columns (schema) you need. This should be a “global” variable (as in, the scope should be such that you can pass it around your workflows without much issue). You can do this, or alternatively, read the datatable (read range) but only one row of it
Get the total number of rows in your excel file
Use Read Range with “add headers” disabled to read one row (dtB). Keep the scope very limited
Transfer your original datatable (dtA) schema to the new one (dtB): Column names, data types (if they didn’t match), null values.

You now have a 1 row datatable with the schema you needed.

Keep in mind, as far as memory usage goes, you’re still at the mercy of the garbage collector. It should work fine as long as you keep your scope limited.

Topic		Replies	Views
Reading entire row of a DataTable Studio	2	946	June 24, 2020
Excel Related : Read multiple columns in Excel Help excel , activities , question	3	1458	January 11, 2021
Activity to read multiple excel columns and write it to a application Studio studio , question , activities_panel	8	304	September 19, 2023
Read from excel without using the loop Help	8	4985	September 30, 2020
Iterate through Column of datatable/xlsx File Help datatable , excel , activities , question	3	1498	January 2, 2020

Most Active Users - Yesterday
Anil_G
ashokkarale
Ajay_Mishra
Gautham_Pattabiraman
BHUSHAN_NAGAONKAR1
vrdabberu
ABHIMANYU_THITE1
lrtetala
samantha_shah
shyamala_shyamu
More details...

Iterating through rows of an Excel table without storing the table itself

Related Topics