Get values from string with only space seperators

Hello,

I am reading a PDF text, and isolating a table. I am then splitting the table data up into seperate substrings in an array.

I am trying to split the substring data up, to get a name, a set of dates, and a cost value. The data is split up by spaces, and looks something like the data below (required values are in bold):

Scheduled Payment Mr John Smith 01/10/20-31/10/20 2,444.44
Scheduled Payment Mrs Paula Smith 01/10/20-31/10/20 5,283.98
Scheduled Payment Miss Susan May Smith 01/10/20-31/10/20 1,000.01
19423492 Sample Company Name (01)
Scheduled Payment Mr John Jones 01/10/20-31/10/20 2,444.44
Scheduled Payment Mrs Jess Mae Jones 01/10/20-31/10/20 5,283.98
Scheduled Payment Miss Susan Jess Jones 01/10/20-31/10/20 1,000.01

The data is semi consistent, the only things that will change is the cost value, and the name may have a middle name in, like the third line of the example. All of the “Scheduled Payments” are in groups under a company name.

What is the most efficient way of splitting up these values and assigning the sections to variables? I was thinking of a regex, but I can’t get my head around them. Could someone help?

Thanks in advance.

I would recommend using one regular expression with three groups:

(?<=Scheduled Payment )(.+) (\d{2}/\d{2}/\d{2}-\d{2}/\d{2}/\d{2}) (\d+,\d+\.\d+)

Use the Matches activity with the pattern above and then loop through the matches with a For Each activity. For each item you can get the values like this:

Name = item.Groups(1).ToString
Date = item.Groups(2).ToString
Cost = item.Groups(3).ToString

1 Like

Hi @ptrobot,

This is the sequence including the Matches activity. I am looking at the locals panel during running, and the variable used in the activity, MatchedItems, does have a value, see below:
image

This is the Matches activity panel, have I set it up correctly?

Here’s the activities also:

If you are using the Regex Builder, don’t use Literal, change it to Advanced.

My recommendation is, if you already have the regular expression, then just paste it in the Pattern field in the Properties:

image

The TypeArgument for the For Each activity should be RegularExpressions.Match.

image

In the message box you need item.ToString to get the full match or item.Groups(x) to get the different groups.

Here’s an example: RegExGroupsTest.xaml (6.3 KB)

1 Like

Hi @ptrobot,

I have copied the properties you have set and dragged in a new matches activity, but still the same empty match I had shown above.

When there are no matches, what is the value of you variable line? (Assuming that it’s the input string to your Matches activity.)

1 Like

The first two lines are blanks, but everything after that is the same format I specified in the initial post. I didn’t add the VAT payment amounts though:
Untitled

Apologies for the data redaction, it may not be as clear now, but this is sensitive information from a client and not made up names I used for an example

Are you using Matches with each line as an input string separately? In that case, you won’t get any matches for the first two empty lines and not for the VAT lines either.

Have you tested with removing the “For Each line in TableLines” and set the TableLines as input string to the Matches activity instead?

1 Like

I used the string variable which initialises the TableLines variable, and this gave MatchedItems no value either. It skipped straight past the for each item in MatchedItems.

Ok, that’s is strange. Could you write TableLines (or the string variable that initializes it) to a text file, copy the text file content to the inputString variable in my uploaded example and run it? (Or you could also modify the example to read the file to the inputString variable). If you run my example after the modification, do you get any matches?

If you still don’t get any matches, then the input text is probably formatted differently than the sample text you provided in your first post.

1 Like

Hi @ptrobot,

I got some help from a colleague in the office. We have managed to work out where the dates are by splitting the line by “/” and assigning the integer substrings to integer variables. If they didn’t break then a full date was present, and we split by the date value to get the name and value. Thank you for your help!



1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.