Issue while reading 30million rows CSV file

Hi,
I am having problem in reading large CSV file.
The file size is 360 MB. There are approximately 30million rows.
Some time bot is able to read the file and some time it is throwing System Exception.

I tried this code and the error is

Code----

Dim file As New FileInfo(“Path\Dump.CSV”)
Dim HeaderDone As Boolean= False
Using reader As StreamReader = file.OpenText()
While Not reader.EndOfStream
Dim nextLine As String = reader.ReadLine()
Dim Values As String() = nextLine.Split(","c)
If Not HeaderDone
For Each value As String In Values
inDT.Columns.Add(value)
Next
HeaderDone = True
Else
Dim valueToFind As String = “Support”
If Values(10)=valueToFind
inDT.Rows.Add(Values)
End If
End If
End While
End Using

ERROR-----

20.6.0-beta.93+Branch.release-v20.6.0.Sha.520fc7e20bd9baa56becf2fa30e75d1a9000a01d

Source: Invoke code

Message: Exception has been thrown by the target of an invocation.

Exception Type: System.Reflection.TargetInvocationException

RemoteException wrapping System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. —> RemoteException wrapping System.IndexOutOfRangeException: Index was outside the bounds of the array.
at UiPathCodeRunner_135f88763df64f17aab5e2b060568d98.Run(DataTable& inDT)
— End of inner exception stack trace —
at System.RuntimeMethodHandle.InvokeMethod(Object target, Object arguments, Signature sig, Boolean constructor)
at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object parameters, Object arguments)
at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object parameters, CultureInfo culture)
at System.RuntimeType.InvokeMember(String name, BindingFlags bindingFlags, Binder binder, Object target, Object providedArgs, ParameterModifier modifiers, CultureInfo culture, String namedParams)
at UiPath.Activities.System.Utilities.InvokeCode.CompilerRunner.Run(Object args)
at UiPath.Activities.System.Utilities.InvokeCode.NetCodeInvoker.Run(String userCode, List1 inArgs, IEnumerable1 imps, Object args)
at UiPath.Core.Activities.InvokeCode.Execute(CodeActivityContext context)
at System.Activities.CodeActivity.InternalExecute(ActivityInstance instance, ActivityExecutor executor, BookmarkManager bookmarkManager)
at System.Activities.ActivityInstance.Execute(ActivityExecutor executor, BookmarkManager bookmarkManager)
at System.Activities.Runtime.ActivityExecutor.ExecuteActivityWorkItem.ExecuteBody(ActivityExecutor executor, BookmarkManager bookmarkManager, Location resultLocation)

Can anyone suggest me any other way of reading CSV file?

1 Like

@bcorrea Plz suggest

1 Like

Hi
we can try to read such large files with python PANDAS and CSV
and that python code can be invoked with UiPath Python Activities…
There are lot of coding resources on this using python and it can be easily integrated with uipath.

Kindly try that and let know for any queries or clarification

Cheers @venkat_patibandla

one way, you can divide the rows from in different datatable i.mean read excel in various datatable i.e. 70k row in one datatable in this way and perform the operation on these datatable…

Yah we can try that as well,
But still we need to read the data with an activity @Shirish and reading a CSV file of 360 MB will obviously take time
Usually the csv file doesn’t take much memory and if so then there is huge number of records

Cheers @venkat_patibandla

That is where your error is. My guess is that you may have lines that are empty or is not a real row with 11 columns.

HI @bcorrea

plz suggest with code I need to filter “support” in group column and “June” data in solved at column
Note:- there is no empty in group column

When i worked with your sample data, the column number you want to filter is not index 10, but index 9… So this needs to be like this: If Values(9)=valueToFind. Also When i created my CSV, i used semicolon instead of comma and all rows went ok with no error.

HI @bcorrea

I found the problem in My CSV its bcoz of one column having a comma inside it so I tried below code
Dim Values As String() = nextLine.Split(",(?=([^""] “”[^""] “”) [^""] $)".ToCharArray)
Instead of this
Dim Values As String() = nextLine.Split(","c)

now header’s also splitting to sperate column bcoz its having space
then header’s getting duplicate names in the datatable

RemoteException wrapping System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. —> RemoteException wrapping System.Data.DuplicateNameException: A column named ‘Requester’ already belongs to this DataTable.

plz suggest the code to get complete columns or if not possible how to filter that “comma inside” columns

Note:- that comma column having multiple special charcters

Try this one:
Dim values As String() = System.Text.RegularExpressions.Regex.Split(sentence, "(?:^|,)(?=[^"]|(")?)"?((?(1)[^"]*|[^,"]*))"?(?=,|$)")

Getting error

I have updated the sampedata with “comma inside” columns plz write a code

just double the quotes there inside the expression.

@bcorrea pls help me to remove that column

Done

again showing same error

RemoteException wrapping System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. —> RemoteException wrapping System.IndexOutOfRangeException: Index was outside the bounds of the array.

 Dim inDT As DataTable = New DataTable
        Dim file As New FileInfo("D:\UserData\Downloads\Sampledata - Sheet1.csv")
        Dim HeaderDone As Boolean = False
        Using reader As StreamReader = file.OpenText()
            While Not reader.EndOfStream
                Dim nextLine As String = reader.ReadLine()
                Dim Values As String() = System.Text.RegularExpressions.Regex.Matches(nextLine, "(?:^|,)(?=[^""]|("")?)""?((?(1)[^""]*|[^,""]*))""?(?=,|$)").Cast(Of System.Text.RegularExpressions.Match).[Select](Function(m) m.Value.Replace(",", "")).ToArray()
                If Not HeaderDone Then
                    For Each value As String In Values
                        inDT.Columns.Add(value)
                    Next
                    HeaderDone = True
                Else
                    Dim valueToFind As String = "Support"
                    If Values(9) = valueToFind Then
                        inDT.Rows.Add(Values)
                    End If
                End If
            End While
        End Using

I got same error bro

RemoteException wrapping System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. —> RemoteException wrapping System.IndexOutOfRangeException: Index was outside the bounds of the array.

Plz check I will be online

then your csv is not the same you showed me…

HI @bcorrea
Yes

I am trying to apply this on 30mllion rows csv file

its showing error and I found which columns are affected

I have updated final sample data

we can solve this issue by writing regex to that strings

i think we are having problems on lines that look like this:
1,1039621,JP Tees,376868120332,,jptees18@gmail.com,gmail.com,JP Tees,Supriya AH,Support,"Order #6989861 इसमे दो झुमके ऑर्डर किये थे, एक टूटा निकला, मेने एक रिटर्न किया था, पर कोरियर वाला दोनो लेके गया। रिफंड मुझे एक का ही मिला।",ss_order_confirmation supriya.ah,Closed,Low,Mail,Ticket,5/14/19 10:24,5/16/19 18:32,5/14/19 10:28,,,5/14/19 10:28,5/14/19 18:22,8,Offered,1,1,0,2,313,313,478,478,478,478,0,0,478,478,0,0,-,-,-,0,-,Supriya.AH,-,-,-,0,SS_Order confirmation,-,-,-,-,-,-,-,-,-,-,-,-

So i changed the code to this:

Dim inDT As DataTable = New DataTable
        Dim file As New FileInfo("D:\UserData\Downloads\Sampledata - Sampledata - Sheet1.csv")
        Dim HeaderDone As Boolean = False

        Using parser As New Microsoft.VisualBasic.FileIO.TextFieldParser(file.FullName)
            parser.TextFieldType = FieldType.Delimited
            parser.Delimiters = New String() {","}
            parser.HasFieldsEnclosedInQuotes = True
            Dim Values() As String = Nothing
            While Not parser.EndOfData
                Values = parser.ReadFields()

                If Not HeaderDone Then
                    For Each value As String In Values
                        inDT.Columns.Add(value)
                    Next
                    HeaderDone = True
                Else
                    Dim valueToFind As String = "Support"
                    If Values.Length > 8 AndAlso Values(9) = valueToFind Then
                        inDT.Rows.Add(Values)
                    End If
                End If
            End While
        End Using

HI Bcorrea.,

This is code perfectly working and i want filter current month from date column(16) instead of filtering Support in Column 9

Regards,
Venkat.