@Anshuman_Malik sorry, it was making my head hurt and didn’t get around to it. I thought there might be a fast way to use GroupBy(), however, I couldn’t wrap my head around it. So instead I took an easier approach just to basically loop over the files and group them by taking their sum, then remove those files from the initial array, and repeat.
psuedocode:
Assign files array from directory
Initialize List of files array to add each group to
While files.Count > 0
Initialize fileGroup
Initialize sizeSum
For each f In files array
If sizeSum + New FileInfo(f).Length <= sizeLimit
Assign sizeSum = sizeSum + New FileInfo(f).Length
Assign fileGroup = fileGroup.Concat({f})
Assign files = files.Where(Function(f) Not fileGroup.Contains(f) ).ToArray
Add to Collection //adds to fileGroups List
FileGroupsBySize.xaml (13.8 KB)
Here you go as an example. You can also turn this into a re-usable if you want by changing the List variable to an ‘out’ argument, and the directory, pattern, and sizeLimit to ‘in’ arguments.
I placed a Write Line to output the results of the fileGroups List, so you can see how it joined the files together. You can remove the Write Line, as it’s only temporary.
With the List (named fileGroups in this example), you can use in a ForEach loop to upload into a system, add as attachments, or zip or join pdf then upload… so, it’s up to you.
Hey @ClaytonM, you are amazing. Thank you so much. I just have one more question. You script picks files by File name not by file size for grouping them.
1st Image is the output I get running your script: (considers the logic based on filename)
2nd image is how I wanted. Basically one of the goals is too reduce the number of files and optimizing the files
If this is not doable or too much of an ask I can totally understand. I almost banged my head on the wall at this logic since I am still very new to coding and UiPath.
Yeah, it is pulling in the filenames by alphabetical order. If you want to perform it by filesize order, then you just need to adjust that Assign activity at the beginning that stores the files into an array.
Instead of this:
files = Directory.GetFiles(dir, filePattern).Where(Function(f) New FileInfo(f).Length <= sizeLimit).ToArray
Use this:
files = Directory.GetFiles(dir, filePattern).Where(Function(f) New FileInfo(f).Length <= sizeLimit).OrderBy(Function(f) New FileInfo(f).Length ).ToArray
The ForEach needs to be of type String[], since each item in fileGroups is an array of string. Although, you can technically use an Object as long as you convert it as you use each item. Change Object to Array(Of String)
it fails on the 2nd and 3rd one since join activities needs atleast 2 files. I guess my question is how can I count inside the “item” in the list. So I can use it as a condition while running my script.
Even if you found a solution by now, as I started working on mine earlier, I will leave it here for you as another variant. PDFsMergeToSizeLimit.zip (24.1 KB)
You will need to add the pdfs in PDF\ folder prior to running.
This uses the subset sum approach -
computes each subset of the array of pdf file sizes
example: for an array = {1,2,3,4} => subsets: {1},{2},{3},{4},{1,2},{1,3},{1,4},{2,3},{2,4},{3,4},{1,2,3},{1,3,4},{2,3,4} >
computes the sum of each subset and chooses the subsets that are closest to the set limit. Each element of the array is present only once through the subsets selected.
example: for array = {1,2,3,4,5,6,7} and limit = 6 =>
{1,2,4} sum: 7
{6} sum: 6
{5} sum: 5
{3} sum: 3
In this way, joining the pdfs is done in the most efficient way, getting closest to the limit of 10mbs for each output file and minimizing the numer of joins.This might come in handy especially if the number of pdfs that you have to process is quite large.
Thanks for posting that, cause I was interested in the “best” sum algorithm too. You also have a good example to store both file and size into a dictionary, which I liked.
using the pattern you cant use deny logic, but since you are using linq, just add it to your Where statement like: files = Directory.GetFiles(dir, filePattern).Where(Function(f) New FileInfo(f).Length <= sizeLimit And Not f.Contains(".pdf")).OrderBy(Function(f) New FileInfo(f).Length ).ToArray
Change filePattern to "*.*" to include all files, then refer to @bcorrea’s suggestion. However, I would make one change to his example to look at the extension (technically a file could be “filename.pdf.xlsx” and contain .pdf, so it’s better to look at extension)
files = Directory.GetFiles(dir, "*.*").Where(Function(f) New FileInfo(f).Length <= sizeLimit And Not Path.GetExtension(f).ToUpper.Equals(".PDF")).OrderBy(Function(f) New FileInfo(f).Length ).ToArray
There could also be a getfiles pattern to say “not .pdf”, however I don’t know enough about the .net getfiles pattern syntax to say for sure and would need to research it, which I don’t have time