Which option helps avoid system.outofmemory exception

Hi, I have a couple of Excel files where I have to read range (rows may go up to 10k). From those datatables, I have to count rows that fulfill specific criteria for different entities.

I now have a do while to loop for each entity.

Would like to ask which option is much more feasible in terms of avoiding system out of memory exception

Option 1: Place “For Each Excel File - Read Range” inside Do While loop, so always having only 1 datatable, but may have to read same datatable a couple times.

Option 2: Place “For Each Excel File - Read Range” outside Do While loop, so always remembering a few datatables at the same time, but does not have to read for each entities.

Attach workflow belowAML_Excel_UpdateDashboard.xaml (52.9 KB)

you dont not need to loop the rows, just use select query like yourDataTable.Select(select query);

Yes,I do not intend to loop rows.

I’m wondering if I need to loop the datatables. As I have multiple Excels to be read (thus multiple datatable)

even if you have multiple excel why you have to loop…?

what is your requirement.?

Cause i need to perform select method on each of the datatable. So I was wondering if I used For Each excel path (in list of excel path) read range would use a lot of memory. Cause I’d have to read at least 6 excels for each of the entities (and I would have 9 of them)

Hi @inyourgravity

I’m not completely sure of your requirements, because you didn’t annotate or rename your activities with descriptive names. But, I can point out some tips when dealing with large amounts of Excel data.

—Have a list of files in an array or list variable (you can also use Directory.GetFiles() method to find files in a folder if needed, or hardcode the files into an Array of string variable)
—Then, use a For each file in filelist (this way, you only need one Excel scope and no need to repeat steps to use for each file)
—Filter your data with query methods to instantly pull only the items you wish to look at
—If necessary process that new set of filtered data using a For each row
—if you are not needing to process the items and only want to check the count, then use .Count on your filtered data.

Just a simple example of psuedocode to show what I mean:

For each file in Directory.GetFiles(folderpath,"*.xlsx")
   Excel scope (fiepath: file)
       Read Range (store in data table variable, I'll call dt1)
       Assign filteredData = dt1.AsEnumerable.Where(Function(r) If(IsNumeric(r(1).ToString.Trim), CDbl(r(1).ToString.Trim) > 0, False) ).ToArray
       If filteredData.Count > 0
           <perform some tasks>

I would not recommend using Do While’s with data table automation.

Hope it helps.

Regards.

a good practice is to slplit the file even we can manage a large amount of data… :slight_smile:

If all Excel sheets have same structured data to don’t need to loop one by one you can just merge all together and write one select query

Yes,my question is would it take more memory if I loop it (maintaining around 10k rows at a time) or if I merge all datatables together (maintaining 9x 10k rows at a time)

That’s what am saying you don’t need to loop, have one new datatable, then add those filtered data rows to your new datatable.

This is just from my own understanding, but storing 9 tables of 10k is the same as storing 1 with 90k. So, merging the data to one single table just depends on your requirements, like if you need to look at entire data set to make a comparison. However, if you need to look at each individual data set for a comparison, then I would suggest looping through each File… this also allows you to store 1 data table of 10k at a time, rather than 1 large one of 90k, because you would overwrite the table as you go through each file and perform some tasks or comparisons. But, like I said, merging the data to one data set depends on your requirements.

Most of the resources being used will come from running through massive loops, you know like looping through something with 90k items. The longer the process runs, the more memory it will eventually use up. So keep your loops short. This is also where query methods are useful, because you can pull out the data you want to look at with 1 step rather than needing to loop through all 90k rows.

Please forgive me if I’m wrong.

Regards.

1 Like

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.