Hi Team, I have 5 files
1 is repo file containing 1-2 lakh of rows each row represents an individuals complete information
2 is input file containing 10K-15k rows each row represent information of an individual to update
3 is Updated File - blank initially
4 is Reference File - blank initially only headers are present
5 is Non Updated File - blank initially
Now for each input row, I have to check whether a unique individual is present in repo file which I am doing by DT.SELECT method. If no, then need to add it on non updated file and reference file which I am doing building Data table and appending to CSV.
If a unique individual is present in repository file then I need to update a row in repository file based on input file. (Note: No. of columns in repo file and input files are also different. Only 5-6 columns are common and need to update information of only 2 columns). Also I need to take that updated row in repo file and append it to Updated File and reference file.
I am doing this by selecting a row from repo file, updating value and adding the row to DB and reference file. This method will not update the row in repo file. So for updating that I am looping through each row in repo file (1Lkh-2Lkh rows), if the student information matches, updating that row information and then writing all the rows in CSV file. Means right now I am reading and writing 1 Lkh-2Lkh rows 10K-15k times which is very time consuming
Can any one please help me with efficient way for this.
I thought writing a 1Lkh-2Lkh rows in a single go will save a lot of time, then in that scenario I need to remove a row first from reference file which is again time consuming.