Hi, I have a couple of Excel files where I have to read range (rows may go up to 10k). From those datatables, I have to count rows that fulfill specific criteria for different entities.
I now have a do while to loop for each entity.
Would like to ask which option is much more feasible in terms of avoiding system out of memory exception
Option 1: Place “For Each Excel File - Read Range” inside Do While loop, so always having only 1 datatable, but may have to read same datatable a couple times.
Option 2: Place “For Each Excel File - Read Range” outside Do While loop, so always remembering a few datatables at the same time, but does not have to read for each entities.
Cause i need to perform select method on each of the datatable. So I was wondering if I used For Each excel path (in list of excel path) read range would use a lot of memory. Cause I’d have to read at least 6 excels for each of the entities (and I would have 9 of them)
I’m not completely sure of your requirements, because you didn’t annotate or rename your activities with descriptive names. But, I can point out some tips when dealing with large amounts of Excel data.
—Have a list of files in an array or list variable (you can also use Directory.GetFiles() method to find files in a folder if needed, or hardcode the files into an Array of string variable)
—Then, use a For each file in filelist (this way, you only need one Excel scope and no need to repeat steps to use for each file)
—Filter your data with query methods to instantly pull only the items you wish to look at
—If necessary process that new set of filtered data using a For each row
—if you are not needing to process the items and only want to check the count, then use .Count on your filtered data.
Just a simple example of psuedocode to show what I mean:
For each file in Directory.GetFiles(folderpath,"*.xlsx")
Excel scope (fiepath: file)
Read Range (store in data table variable, I'll call dt1)
Assign filteredData = dt1.AsEnumerable.Where(Function(r) If(IsNumeric(r(1).ToString.Trim), CDbl(r(1).ToString.Trim) > 0, False) ).ToArray
If filteredData.Count > 0
<perform some tasks>
I would not recommend using Do While’s with data table automation.
Yes,my question is would it take more memory if I loop it (maintaining around 10k rows at a time) or if I merge all datatables together (maintaining 9x 10k rows at a time)
This is just from my own understanding, but storing 9 tables of 10k is the same as storing 1 with 90k. So, merging the data to one single table just depends on your requirements, like if you need to look at entire data set to make a comparison. However, if you need to look at each individual data set for a comparison, then I would suggest looping through each File… this also allows you to store 1 data table of 10k at a time, rather than 1 large one of 90k, because you would overwrite the table as you go through each file and perform some tasks or comparisons. But, like I said, merging the data to one data set depends on your requirements.
Most of the resources being used will come from running through massive loops, you know like looping through something with 90k items. The longer the process runs, the more memory it will eventually use up. So keep your loops short. This is also where query methods are useful, because you can pull out the data you want to look at with 1 step rather than needing to loop through all 90k rows.