Help with .select and regex

Hi,
I am trying to merge pdf files based on the there filename with the same number in it.
i extract the filename from this array called files, the array looks like this:
c:\Folder\Nr_11111111_2222.pdf
c:\Folder\Nr_11111111_2222_1.pdf
c:\Folder\Nr_33333333_2222.pdf
c:\Folder\Nr_33333333_2222.pdf
c:\Folder\Nr_44444444_2222.pdf
c:\Folder\Nr_44444444_2222.pdf

Using an Assign activitie:
FileName=files.Select(Function (s) System.Text.RegularExpressions.Regex.Match(s,“(\d{8})_(\d{4})”).Value.ToString).Distinct.ToArray

The result is like this:
11111111_2222.pdf
33333333_2222.pdf
44444444_2222.pdf
.pdf

The first three is correct the last one .pdf is a file containing all above.
Wy do i have the last one and wy is it containing all of the files?

Can someone tell me what im doing wrong or just pointing me in a direction?

Best regards Mikael

Check your “files” object. Apparently he is getting more items than he should. I tested your regex, and it’s correct and your distinct too.
Main.xaml (5.2 KB)

@Jorge_Cavalcante, thanx. U pointed me in the right direction. The problem is that it was several files in the folder with different names like 234.pdf. So when i deletede those i worked like it suppose to.

It works as long as there are files that match the pattern.

I thought that with the help of regex I could pick out the files that fit my pattern and leave the rest.
Could you help me understand exactly how this works and why I get a sloppy file ?!

And what do I do to get only those who match the pattern?

Best regards Mikael

Because the code System.Text.RegularExpressions.Regex.Match(s,"(\d{8})_(\d{4})").Value.ToString returns an empty string if there’s no match.

You should be able to remove the empty entry with:

FileName = FileName.Where(Function(x) Not String.IsNullOrWhitespace(x)).ToArray()
1 Like

@ptrobot Thanx, that worked.
I did combine this two like this:
files.Select(Function (s) System.Text.RegularExpressions.Regex.Match(s,“(\d{8})_(\d{4})”).Value.ToString).Distinct.Where(Function(x) Not String.IsNullOrWhitespace(x)).ToArray()

2 Likes

I play around a bit, just try to learn.
So I tried to use Regex.Split instead but then I get: System.String alternatively: s if I use item (0) .ToString.

The method I use is based on the previus code:

files.Select (function (s) System.Text.RegularExpressions.Regex.Split (s, “(\ d {8}) _ (\ d {4})”). ToString) .Distinct. ToArray

How should i do it if i want to use the split method? What did i miss this time :thinking:

Maybe this would be a topic of its own?

Best regards Mikael

Hi,

I don’t know if I understand correctly, but you can work with groups in regex:

GroupCollection groups = regex.Match(line).Groups;
foreach (string groupName in regex.GetGroupNames())
{
    Console.WriteLine( "Group: {0}, Value: {1}",  groupName,  groups[groupName].Value);
}

You have to excuse me for being bad at explaining myself but I think I just want to know the difference between these.
I have read docs but do not feel that I have really understood how to use them.
But i will look into the groups, thanks for your effort.

Best regards Mikael

RegEx.Split() will split at the match and return an array so I would not recommend it in this situation.

For example:

inputString = "Hello1234World9392This1919is00a0023test"
myArray = RegEx.Split(inputString, "\d+")

myArray will contain:
Hello
World
This
is
a
test

ok.
The RegEx.Split can it handle arrays or is it just strings?

It returns an array of strings.

No Problem.

You can use groups to specify a name for your Match.

Using Split you will have the content and in Array, and you will have the index of each item.

If you have a string that always has the same format and the same amount of index as your array, you can use Split, but if your string changes the amount of Index, you will have trouble mapping the Indexes.

Using the Group you search for the value assigned to this group, regardless of the number of groups that your string returns in your Regex pattern.

There are two approaches.

Regards.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.