Efficient way to filter large data files

Hi Team, I have 5 files
1 is repo file containing 1-2 lakh of rows each row represents an individuals complete information
2 is input file containing 10K-15k rows each row represent information of an individual to update
3 is Updated File - blank initially
4 is Reference File - blank initially only headers are present
5 is Non Updated File - blank initially

Now for each input row, I have to check whether a unique individual is present in repo file which I am doing by DT.SELECT method. If no, then need to add it on non updated file and reference file which I am doing building Data table and appending to CSV.
If a unique individual is present in repository file then I need to update a row in repository file based on input file. (Note: No. of columns in repo file and input files are also different. Only 5-6 columns are common and need to update information of only 2 columns). Also I need to take that updated row in repo file and append it to Updated File and reference file.
I am doing this by selecting a row from repo file, updating value and adding the row to DB and reference file. This method will not update the row in repo file. So for updating that I am looping through each row in repo file (1Lkh-2Lkh rows), if the student information matches, updating that row information and then writing all the rows in CSV file. Means right now I am reading and writing 1 Lkh-2Lkh rows 10K-15k times which is very time consuming
Can any one please help me with efficient way for this.
I thought writing a 1Lkh-2Lkh rows in a single go will save a lot of time, then in that scenario I need to remove a row first from reference file which is again time consuming.

I don’t know if it’s going to help you much, but there are two possibilities to handle large data files:

  1. Use C# or Python Code
    If you know C# it is possible through Invoke Code to write C# code in UiPath

  2. Use LINQ statements
    It is similar to SQL Queries, you can get started here:
    [HowTo] - Exploring the LINQ Universe (VB.Net)

  3. Process optimization
    “For each row” can be very time consuming, that’s why using LINQ statements is better.
    With the right statement you get the desired row and update the specific row by using the index.

Side note:
If you are running multiple robots you should divide your input file accordingly.

Hey @Karan_Katle

I see your files normal and okay.

You can process it with Excel activities.

If still you want something, you can go with LINQ!


@Smars @Nithinkrishna Thanks for the suggestion, I tried using Linq query, removed loop and reduced a processing time from 13 sec to 9 sec per record which is approx 25% of time consuming.

One more thing I observed. Out of 9 sec reading repo csv file itself is taking 5 sec of time. Our requirement is need we need to read the repo csv in a loop, so processing of complete transaction now taking 9 sec out of which 4-5 sec is consumed for reading repo csv only. Is there any efficient way to do it?

Hey @Karan_Katle

You may try using file.ReadLines method but still I feel may not be much difference.


As I mentioned before you can split the files into smaller datasets and join them after you processed all repo files