Search duplicate files

studio

#1

Hi,
Want to delete all duplicate files from directory. So, I am creating a list of files in 1st folder and checking the names in 2nd folders, if same file is found in 2nd folder then move it to a duplicate folder, if no duplicates are found a a new list will be created combining folder 1 and 2, this will check folder 3 for duplicated, and so on. Is there a better way to find duplicate files in directory? Please suggest if there is any other way to achieve this. Many Thanks.


#2

Hi.

From my understanding, you want to check for duplicate files, if there are, then create a new “folder #”.

So my thinking was you could get all files in the home directory. Select only the filenames, then take the Distinct filenames. Then, use the Count and compare it

System.IO.Directory.GetFiles("C:\HomeDirectory","(.*)",SearchOption.AllDirectories).Select(Function(x) System.IO.Path.GetFileName(x)).ToArray.Distinct.Count < System.IO.Directory.GetFiles("C:\HomeDirectory","(.*)",SearchOption.AllDirectories).Select(Function(x) System.IO.Path.GetFileName(x)).ToArray.Count

So basically do If .Distinct.Count < .Count, which tells you that some of the duplicates were removed from the list.

If it is less than and the condition is true, then create a folder using the last folder name with Regex.Replace.

System.Text.RegularExpressions.Regex.Replace(System.IO.Directory.GetDirectories("C:\HomeDirectory").OrderBy(Function(x) x).Last, "[0-9]{1,3}",  _
    If(IsNumeric(System.Text.RegularExpressions.Regex.Match(System.IO.Directory.GetDirectories("C:\HomeDirectory").OrderBy(Function(x) x).Last, "[0-9]{1,3}").Value), _
	(CInt(System.Text.RegularExpressions.Regex.Match(System.IO.Directory.GetDirectories("C:\HomeDirectory").OrderBy(Function(x) x).Last, "[0-9]{1,3}").Value)+1).ToString, _
	System.IO.Directory.GetDirectories("C:\HomeDirectory").OrderBy(Function(x) x).Last+" 1") )

So you can use something like in the Create Directory activity, and feel free to store some of the code segments in variables.

Anyway, that was my idea.

Apologies if I get some the logic needed wrong. What I suggested would work assuming there are no duplicates with all the folders combined into one list, so if the duplicates stay in those folders it will always see a duplicate and create a new folder. So, one might remove the duplicate files, and if that is desired, I might be able to provide a suggestion on that later.

Regards.