Search duplicate files

Hi,
Want to delete all duplicate files from directory. So, I am creating a list of files in 1st folder and checking the names in 2nd folders, if same file is found in 2nd folder then move it to a duplicate folder, if no duplicates are found a a new list will be created combining folder 1 and 2, this will check folder 3 for duplicated, and so on. Is there a better way to find duplicate files in directory? Please suggest if there is any other way to achieve this. Many Thanks.

Hi.

From my understanding, you want to check for duplicate files, if there are, then create a new “folder #”.

So my thinking was you could get all files in the home directory. Select only the filenames, then take the Distinct filenames. Then, use the Count and compare it

System.IO.Directory.GetFiles("C:\HomeDirectory","(.*)",SearchOption.AllDirectories).Select(Function(x) System.IO.Path.GetFileName(x)).ToArray.Distinct.Count < System.IO.Directory.GetFiles("C:\HomeDirectory","(.*)",SearchOption.AllDirectories).Select(Function(x) System.IO.Path.GetFileName(x)).ToArray.Count

So basically do If .Distinct.Count < .Count, which tells you that some of the duplicates were removed from the list.

If it is less than and the condition is true, then create a folder using the last folder name with Regex.Replace.

System.Text.RegularExpressions.Regex.Replace(System.IO.Directory.GetDirectories("C:\HomeDirectory").OrderBy(Function(x) x).Last, "[0-9]{1,3}",  _
    If(IsNumeric(System.Text.RegularExpressions.Regex.Match(System.IO.Directory.GetDirectories("C:\HomeDirectory").OrderBy(Function(x) x).Last, "[0-9]{1,3}").Value), _
	(CInt(System.Text.RegularExpressions.Regex.Match(System.IO.Directory.GetDirectories("C:\HomeDirectory").OrderBy(Function(x) x).Last, "[0-9]{1,3}").Value)+1).ToString, _
	System.IO.Directory.GetDirectories("C:\HomeDirectory").OrderBy(Function(x) x).Last+" 1") )

So you can use something like in the Create Directory activity, and feel free to store some of the code segments in variables.

Anyway, that was my idea.

Apologies if I get some the logic needed wrong. What I suggested would work assuming there are no duplicates with all the folders combined into one list, so if the duplicates stay in those folders it will always see a duplicate and create a new folder. So, one might remove the duplicate files, and if that is desired, I might be able to provide a suggestion on that later.

Regards.

1 Like

Thanks for the solution, and sorry for the late response. Could we not do it in a better way? Suppose, I will go into a folder take a file name i.e. Exapmple.txt then look for the file in the whole drive, if I find it in other folder then the current one then copy that file to a new folder, lets say by creating DUPLICATE folder?

I could probably revise it.

What if you did it like this?:
—Take all files in a homedirectory
—Manipulate list of files to only those with duplicates
—Loop through each file that has a duplicate
------Copy or Move file to Duplicate folder

To manipulate a list to only duplicates you can use a few methods but here is a .GroupBy method:

System.IO.Directory.GetFiles("C:\HomeDirectory","*.*",SearchOption.AllDirectories).GroupBy(Function(x) x).Where(Function(x) x.Count > 1).Select(Function(x) x.Key)

psuedocode would look like this:

Assign duplicates = System.IO.Directory.GetFiles("C:\HomeDirectory","*.*",SearchOption.AllDirectories).GroupBy(Function(x) x).Where(Function(x) x.Count > 1).Select(Function(x) x.Key)

For each file in duplicates
    Copy File

Then, only missing logic is what to use as your Folder name to where to place the duplicate.

Hopefully, that’s closer to what you need. Let me know

Regards.

Thanks for the solutions but need some more theories. I have attached a folder structure, where duplicates files are present. I want to keep the last saved copy of demo.txt file and move rest files to a DUPLICATE folder.Test.zip (1.2 KB)

I will wait to hear from you soon. Thanks.

Hi @raym174

I can help, but can you explain better how you would do this manually?

My current understanding is that you have a folder with many subfolders that could contain a duplicate (because you can’t have a duplicate filename in the same folder unless they are in subfolders)

So, take this example:
image
There is 1 duplicate in each subfolder, making a total of 3 duplicate files.

image
We can then move each file to a new folder (test) in the order of last modified, and create a new folder if it already exists in one.

If you can clarify this and represent how the folder structures will or can be, and how the folder structure will end up being, that would help me suggest a solution for you.

Regards.

FileCleaner.zip (66.3 KB)

Hi, I wish you a wonderful and prosperous new year. I have attached the approach but could not able to finish it, please suggest how to progress further. Thanks in advance.FileCleaner.zip (107.1 KB)

I recommend to use DuplicateFilesDeleter .