How to remove duplicates in text file having more than 700000 of records

Hi Team ,

I have a text file containing of more than 700000 of records ,Can some one help me in removing duplicate from file .
We have tried with Uipath Activities but it didn’t help much it is taking more one hour to compete .
Can some one help me to resolve this issue please?
@J0ska

Thanks .

Hey! Welcome to community!

Are we trying to remove the duplicates from excel or Text file?

Regards,
NaNi

You should provide a sample of the text file with highlighting what you mean by “duplicate”

Cheers

Give a try at
Deduplication
arrLines = File.ReadAllLines("YourPath").Distinct().toArray

Getting back the content string
strContent = String.Join(Environment.newLine, arrLines)

@Thumu_Suresh,

Check this xaml it may help you
RemoveDuplicateLines.xaml (6.2 KB)

can you give some input format , it might help to find easily

This post was flagged by the community and is temporarily hidden.

please find the sample report .
Same text file will contains more 5 Lakhs of records .We have to remove the duplicate.
In txt file data look like this .
ABCD ALL_OTHERS 1000000123 15.9 IND 2434 9999
BDCG LISM_RKP 1000005673 19.1 AUD 2215 99999
ABCD ALL_OTHERS 1000000123 15.9 IND 2434 9999

In text file data look’s like this .
volume will be more than 5 Lakhs .

ABCD ALL_OTHERS 1000000123 15.9 IND 2434 9999
BDCG LISM_RKP 1000005673 19.1 AUD 2215 99999
ABCD ALL_OTHERS 1000000123 15.9 IND 2434 9999

We need remove the duplicate records in txt .

Thanks.

In text file data look’s like this .
volume will be more than 5 Lakhs .

Tried with data table activities but it takes long to remove the duplicates.

ABCD ALL_OTHERS 1000000123 15.9 IND 2434 9999
BDCG LISM_RKP 1000005673 19.1 AUD 2215 99999
ABCD ALL_OTHERS 1000000123 15.9 IND 2434 9999

trying to remove duplicates in text files.
Thanks.

should work with the provided approach from above

Hi Peter ,

I have used below test data .

Getting unique count as 3 instead of two .

Can you help me on this.

sure. Please share with us the text file. So we can replicate / Analyse with the same data

Please find the test files
test.txt (684 Bytes)

could maybe be caused by counting also empty lines, when present in the file

we used your text file:
grafik

with the result:

here also we have a +1 for the empty line

as an alternate we implemented are more defensive deduplication with the idea to remove all spaces, tabs…

grafik

(From x In File.ReadAllLines(myFilePath)
Group x By k=System.text.RegularExpressions.Regex.Replace(x, "\s+", String.Empty) Into grp=Group
Where Not String.IsNullOrEmpty(k.Trim())
Select v=grp.First()).toArray

grafik

Hi Peter ,

I tried using above solution but facing some below issue .
Can you please help .

Sorry for delay in response .

Thanks,
Suresh.

UiPath Project is set to VB.Net as selected Programming Languguage, right?

just isolate the issue by folllowing:

  • create a variable with the path, similar it was done in the sample and use it instead the hard coded filepath
  • isolate the statement into a single assign activity:
arrLines = File.ReadAllLines(myFilePath)

Debug and prototype it within the immediate panel
Understanding the 6 Debugging Panels of UiPath in the easiest way possible! - News / Tutorials - UiPath Community Forum

It is working now .
Thanks you so much for your help.