I am getting an exception while performing Read CSV operation on a .csv file which is around 800 MB (~ 1.3 million record/row entries.
Use case: To perform data manipulations using Datatable operations with the .csv file having raw, input data records.
Below is the exception encountered while performing a Read CSV activity on this 800 MB file.
Message: Job stopped with an unexpected exit code: 0xE0434352 Exception Type: System.Exception RemoteException wrapping System.Exception: Job stopped with an unexpected exit code: 0xE0434352
Would really help if there exists a supporting package for large csv imports which would help handle this exception, or if there exists and alternate solution to read data from large .csv files efficiently and put them in a Datatable to work with.
PS: I have tested the same activity with the .csv file but with much records, and it successfully imports data to datatables. Seems to be an issue with the large content in the file.
I just quickly tested this script of using StreamReader and could get the results, but has impact on the performance.
Compare to a direct Read CSV operation, StreamReader reads seems to be taking quite a bit of time.
Also, to additionally compute each csv record, convert it to datarow and then add it to the datatable using Add Data Row activity seems to be adding to its computation time, and doing this for close to 1.3 mil records is taking up time.
Any way around this in order to do load these csv data to a datatable more efficiently?
Hey Guys, @badita@ClaytonM@Palaniyappan
Looping y’all on this thread to seek any direction on solution and if mine and Anthony’s understanding is correct with respect to loading a large csv into a datatable using Read CSV without running into exceptions.
Also, any external code/module that can be used to perform this operation without having to parse each of the million records to add into a datatable?
Let me know if you guys have any thoughts on this.
You could check the Component Marketplace to see if there are any “Large DataTable/Excel activities”. I can’t remember for certain if someone created something like that.
I kind of think that the process should change so handling of such large CSV files is not needed (unless it’s just a long backlog of items that needs to cleared up).
For example, where is this CSV data coming from, a database? Why not execute query to pull in only the data you need, if that i the case? A good query string can take a few seconds even with millions of records.
I would also add that CSV is text-based, so technically, you could use Read Text File and string manipulation to pull in the data you need, then write this back to CSV using Write Text File. Iterating the data as string can be super fast with .net code. This method is not the most sustainable though.
What I don’t recommend is processing your data with the entire data and first extract only the records you need to process (100 or 1000, etc). Then, you can output only those records which were processed for review or whatever.
Thing with this file size is that this is being done to support a data migration where we do not have any direct access from where this data is extracted and dumped into csv files. And yes, these files have historical data.
So these huge files are something that I have to work with, unfortunately.
I will check the Marketplace to see if I can find something, but otherwise will try and give the StreamReader solution a run and see if it helps to an effective solution.