Extract each section separately in between strings

Hi. I want to be able to extract each section separately below where data is from reading a PDF and keep looping through until all sections have been extracted. An example input data would be as shown below. The idea is to be able to extract each section from Deliver to until the next Deliver to part below. What would the best way to go about it? I would be putting this data into excel then.

  Deliver to AU12345                                                  Request #111111
  (16) on 16 July 2021 at 2:30 pm

  16 Jul 2021 9:30 am-3:30 pm, Standard meeting, 1 person
  ☐ 12x 4. Afternoon Tea Pack ( min 4 ppl ) - Chefs selection of sweet muffins or slices,
  seasonal fruit, tea, coffee and juice or sparkling water

  Deliver to AU-123456                                              Request #252525
  MF (14) on 19 July 2021 at 8:00 am

  19 Jul 2021 at 8:00 am - 19 Jul 2021 at 5:00 pm

  Standard meeting, 1 person
  ☐ 1x Service Request
  Comment: Boardroom style-TBC
  Catering- TBC

  Deliver to AU-888888                                                   Request #6666666
  Teams MF (10) on 19 July 2021 at 10:00 am

  19 Jul 2021 10:00 am Standard meeting, 1 person
  ☐ 1x Service Request
  Comment: New starter photo

Hi,

Hope the following helps you.

img20210628-3

The regex pattern is "Deliver[\s\S]*?(?=Deliver|$)"

Sequence1.xaml (6.6 KB)

Regards,

1 Like

Hi. Brilliant thank you this solution worked!

There’s another question for this if you would have a better way of going about this. For each result extracted - I need to separate some details inside it like for example I was able to extract the Result ID number by using this in an assign activity - item.ToString.Substring(item.ToString.IndexOf(“Request #”)+“Request #”.Length).Split(Environment.NewLine.ToCharArray)(0)

I am wanting to be able to separate some of these into separate details (Deliver to, Request ID, List of requests, Comment, Catering) like for example there is a list of requests with a checkbox at beginning of it.

I was able to extract what is after checkbox but would like to be able to extract from the checkbox up to Comment = item.ToString.Split("☐"c)(1) rather than spiltting only one line. For Comment and Catering will be okay as this would be set string whereas anything before them could be anything.

Deliver to AU-123456 Request #252525
MF (14) on 19 July 2021 at 8:00 am

19 Jul 2021 at 8:00 am - 19 Jul 2021 at 5:00 pm

Standard meeting, 1 person
☐ 1x Service Request
Comment: Boardroom style-TBC
Catering- TBC

Hi,

We can achieve it using regex lookahead and lookbehind as the following, for example.

System.Text.RegularExpressions.Regex.Match(item.Value,"(?<=Deliver to)[\s\S]*?(?=Request)").Value.Trim

Hope the following sample helps you.

Sequence1.xaml (8.5 KB)

Regards,

Hi. This regex pattern worked like a charm although I have come to another barrier as for each section will now have either “Deliver to” or “Pick-up” at the beginning of each section. Is there a way of being able to add Pick-up into this regex so it can check for either from Deliver or Pick-up to Pick-up or Deliver?

Deliver[\s\S]*?(?=Deliver|$)

@ciaramkm - You can try like this…

(Deliver|Pick-up)[\s\S]*?(?=Deliver|$)

Thanks but what about when having the Pick-up also in the second part as the below doesn’t seem to do it 100% because in two sections it had Deliver to and Pick-up and it didn’t extract the correct part?
If I added the Pick-up at the near end, like the one below would this be correct?
(Deliver|Pick-up)[\s\S]*?(?=Deliver|Pick-up|$)