Deleting duplicate files in a folder, that have the exact same size

HI all

I’m working on a project where I need to delete any duplicate records of a document in a folder. We’ve decided the best way to do this is to check the file sizes of the documents in the folder, and delete any duplicates (retaining one original).

I’ve been given a bit of a lead, but I wasn’t sure what to do with it, or where to put it -

filesize
I believe this is meant to be an Int64?

Can I have an idea of a xml that could achieve this? The documents are all going to be PDFs

@Sheri
I dont think i will be a good idea to assume two files are same if they have same size…
imaging two files having content 11112222 and 12121212 respectively…
these two file will have same size but data is not same…

Instead of checking file size, go for content of PDF files…

dt = new DataTable
For each pdf in Folder
str = Read PDF text
DT.AddRow = {str,pdf.path}
Next

dtDeleteFile = dt.Select().GroupBy(function( r) r(0)).Where(function(rw) rw.Count > 1).Select(function(rw) rw.First()).CopyToDataTable()

For each row in dtDeleteFile

delete file > row(1).ToString

Next

Thanks for helping!

I’ve almost got this built, but needing some clarification around delete file > row(1).ToString

This is within the For Each Row activity, and is then a Delete activity?

Am I putting file > row(1).ToString in the Path of the Delete activity?

image

I am getting an error when doing so…

image

use only row(1).Tostring
in this case row(1).ToString = pdf file path

Thanks! Will test it out now!

Check This Once
Delete Duplicate File.zip (411.7 KB)

3 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.