How to read through two large .csv files in an efficient way?

I’m working with my first UiPath task. I have two large .cxv files (about 5000 rows). Lets name them DT1 and DT2. Both have a same ID-number. DT1 contains also start date and end date. I should search all rows in the first one having a certain end date, it can be for example running day in this case. For each rows found I have to check whether that ID exists in the DT2 too. If the row doesn’t exist I should write the row from DT1 to the output. Could you recommend how to do this in the most efficient way. I have now been able to create a program reading all the rows that have a certain end date…but it is already quite slow. How to avoid increased slowness when adding the reading of the second file? I would appreciate your answers with examples. Thanks!

Hi @satu.nieminen,
I would say that you could keep both files as Datatable variable and iterate through each table using some if statements.

So not to use Look up method?

Lookup method is ok if you want to perform this inside the excel. You asked for efficient way so I assumed you want to perform it in Studio.

Hi @satu.nieminen

Can u try with join query?

Yes, in Studio @Pablito

@NIVED_NAMBIAR Could you give me an example? I’m newbie using UiPath. :slight_smile: Is it an
efficient way handling large files?

Hi @satu.nieminen

Can u Share sample excel file?

@NIVED_NAMBIAR I can’t share the originals but I can try to create similar files later today.

Please share when u prepare it

Also share the screenshot mode of output u need ?

@NIVED_NAMBIAR Yes I will.

you can use an online csv editor, and perform a join action for the 2 large tables. i use a tool called acho studio for this. For your use case, it sounds like you want to do a “vlookup” for the two large tables. you probably have to write sql queries for it.

Hello Satu,
For large CSV files I have this movie:

Thanks,
Cristian

Hi @Cristian_Negulescu ,

I tried your way to read a very large CSV file but it’s not working correctly when the separator is a semicolon. I tried adding “delim=59” (delimit on semicolon) to the connection string, but it is still just taking the first value in each row (so before the first “;” character). If the separator is comma like in your case, it works fine.

Any ideas?

Hello dVni,
The only Idea is to make Replace of characters outside before arriving at ODBC. Like this

The correct switch will be like this in my view:
Replace all “,” with " "(space)
Replace all “;” with “,”
and then ODBC should not have an issue.
Thanks,
Cristian