Split a string with no fixed length

I need to get these text into separate separate columns in a excel.The 3rd segment is not having a fixed length, it can vary. This is actually an ouput of a read PDF activity. I READ PDF, then SPLIT by new line to get like this. After this please can anyone help?

If regex can be used, please advice

Thanks in advance

I think a safe approach would be to take it part by part leaving the 3rd segment as the remaining text after the extraction.

  1. for 0001 and EFGH you can extract them using a split by space and take the index 0 and 1
  2. for the dates use a regex \d{2}/\d{2}/\d{4} and extract all the dates
  3. for the 3rd segment split by the text extracted in place of “EFGH” and the first date

It takes a bit of work but it should do the trick :slight_smile:

1 Like

Thanks for the reply mate. Let me try it.

1 Like

you do only one regexp extraction and you have the groups to play with
Check this example

1 Like

Hi @amithvs,

Using Regular expression
we can get the data in groups

1 Like

thanks for replying. I am not familiar with regex. Can you explain a bit with sample

thanks for replying, so I have to use this regex pattern in MATCHES activity?

HI @amithvs,

See now you want to take all the dates separately ,for that I will write pattern
(\d{4})([A-Z]{4})([^0-9]+)(\d{2}/\d{2}/\d{4})(\d{2}/\d{2}/\d{4})(\d{2}/\d{2}/\d{4})

This is pattern
You can take matches activity & create output for that
Then using for each
give that matches output in for each,Inside you can call group(1) like that {You will get data }

1 Like

@amithvs,

After splitting text based on line and then again apply split based on space.

PDFText.split(Environment.Newline.ToCharArray(0).split(" ".TocharArray)(0) - 0000

PDFText.split(Environment.Newline.ToCharArray(0).split(" ".TocharArray)(1) - ABCD

etc…

2 Likes