I am reading a PDF text, and isolating a table. I am then splitting the table data up into seperate substrings in an array.
I am trying to split the substring data up, to get a name, a set of dates, and a cost value. The data is split up by spaces, and looks something like the data below (required values are in bold):
Scheduled Payment Mr John Smith01/10/20-31/10/202,444.44
Scheduled Payment Mrs Paula Smith01/10/20-31/10/205,283.98
Scheduled Payment Miss Susan May Smith01/10/20-31/10/201,000.01
19423492 Sample Company Name (01)
Scheduled Payment Mr John Jones01/10/20-31/10/202,444.44
Scheduled Payment Mrs Jess Mae Jones01/10/20-31/10/205,283.98
Scheduled Payment Miss Susan Jess Jones01/10/20-31/10/201,000.01
The data is semi consistent, the only things that will change is the cost value, and the name may have a middle name in, like the third line of the example. All of the “Scheduled Payments” are in groups under a company name.
What is the most efficient way of splitting up these values and assigning the sections to variables? I was thinking of a regex, but I can’t get my head around them. Could someone help?
Use the Matches activity with the pattern above and then loop through the matches with a For Each activity. For each item you can get the values like this:
Name = item.Groups(1).ToString
Date = item.Groups(2).ToString
Cost = item.Groups(3).ToString
This is the sequence including the Matches activity. I am looking at the locals panel during running, and the variable used in the activity, MatchedItems, does have a value, see below:
This is the Matches activity panel, have I set it up correctly?
The first two lines are blanks, but everything after that is the same format I specified in the initial post. I didn’t add the VAT payment amounts though:
Apologies for the data redaction, it may not be as clear now, but this is sensitive information from a client and not made up names I used for an example
Are you using Matches with each line as an input string separately? In that case, you won’t get any matches for the first two empty lines and not for the VAT lines either.
Have you tested with removing the “For Each line in TableLines” and set the TableLines as input string to the Matches activity instead?
I used the string variable which initialises the TableLines variable, and this gave MatchedItems no value either. It skipped straight past the for each item in MatchedItems.
Ok, that’s is strange. Could you write TableLines (or the string variable that initializes it) to a text file, copy the text file content to the inputString variable in my uploaded example and run it? (Or you could also modify the example to read the file to the inputString variable). If you run my example after the modification, do you get any matches?
If you still don’t get any matches, then the input text is probably formatted differently than the sample text you provided in your first post.
I got some help from a colleague in the office. We have managed to work out where the dates are by splitting the line by “/” and assigning the integer substrings to integer variables. If they didn’t break then a full date was present, and we split by the date value to get the name and value. Thank you for your help!