Search duplicate files

studio

#1

Hi,
Want to delete all duplicate files from directory. So, I am creating a list of files in 1st folder and checking the names in 2nd folders, if same file is found in 2nd folder then move it to a duplicate folder, if no duplicates are found a a new list will be created combining folder 1 and 2, this will check folder 3 for duplicated, and so on. Is there a better way to find duplicate files in directory? Please suggest if there is any other way to achieve this. Many Thanks.


#2

Hi.

From my understanding, you want to check for duplicate files, if there are, then create a new “folder #”.

So my thinking was you could get all files in the home directory. Select only the filenames, then take the Distinct filenames. Then, use the Count and compare it

System.IO.Directory.GetFiles("C:\HomeDirectory","(.*)",SearchOption.AllDirectories).Select(Function(x) System.IO.Path.GetFileName(x)).ToArray.Distinct.Count < System.IO.Directory.GetFiles("C:\HomeDirectory","(.*)",SearchOption.AllDirectories).Select(Function(x) System.IO.Path.GetFileName(x)).ToArray.Count

So basically do If .Distinct.Count < .Count, which tells you that some of the duplicates were removed from the list.

If it is less than and the condition is true, then create a folder using the last folder name with Regex.Replace.

System.Text.RegularExpressions.Regex.Replace(System.IO.Directory.GetDirectories("C:\HomeDirectory").OrderBy(Function(x) x).Last, "[0-9]{1,3}",  _
    If(IsNumeric(System.Text.RegularExpressions.Regex.Match(System.IO.Directory.GetDirectories("C:\HomeDirectory").OrderBy(Function(x) x).Last, "[0-9]{1,3}").Value), _
	(CInt(System.Text.RegularExpressions.Regex.Match(System.IO.Directory.GetDirectories("C:\HomeDirectory").OrderBy(Function(x) x).Last, "[0-9]{1,3}").Value)+1).ToString, _
	System.IO.Directory.GetDirectories("C:\HomeDirectory").OrderBy(Function(x) x).Last+" 1") )

So you can use something like in the Create Directory activity, and feel free to store some of the code segments in variables.

Anyway, that was my idea.

Apologies if I get some the logic needed wrong. What I suggested would work assuming there are no duplicates with all the folders combined into one list, so if the duplicates stay in those folders it will always see a duplicate and create a new folder. So, one might remove the duplicate files, and if that is desired, I might be able to provide a suggestion on that later.

Regards.


#3

Thanks for the solution, and sorry for the late response. Could we not do it in a better way? Suppose, I will go into a folder take a file name i.e. Exapmple.txt then look for the file in the whole drive, if I find it in other folder then the current one then copy that file to a new folder, lets say by creating DUPLICATE folder?


#4

I could probably revise it.

What if you did it like this?:
—Take all files in a homedirectory
—Manipulate list of files to only those with duplicates
—Loop through each file that has a duplicate
------Copy or Move file to Duplicate folder

To manipulate a list to only duplicates you can use a few methods but here is a .GroupBy method:

System.IO.Directory.GetFiles("C:\HomeDirectory","*.*",SearchOption.AllDirectories).GroupBy(Function(x) x).Where(Function(x) x.Count > 1).Select(Function(x) x.Key)

psuedocode would look like this:

Assign duplicates = System.IO.Directory.GetFiles("C:\HomeDirectory","*.*",SearchOption.AllDirectories).GroupBy(Function(x) x).Where(Function(x) x.Count > 1).Select(Function(x) x.Key)

For each file in duplicates
    Copy File

Then, only missing logic is what to use as your Folder name to where to place the duplicate.

Hopefully, that’s closer to what you need. Let me know

Regards.