Regex TEXT TO COLUMNS

Hi Community,

I have pdf file which i have converted to text and trying to build a data table with the help of regex.

ANGICAM-LT GEL 15GM PUREandCUR 08/24
DICLOTAL AQ INJ 1ML AKUMS - HA 08/24
LIPONORM 10 TABLETS 15’S BCL-NSK NLEM 06/24
R-RD CAPS 15’S PUREandCUR 05/24

These are the sample data. I want to split this into three columns.
Column 1: Starts with any alphabet and ends with words like TABS, TABLETS, CAPS, INJ, GEL
Column 2: Packing related - starts with any number and ends with ML, 'S, GM . For example( 1ML, 20GM, 10’S)
Column 3: Expiry date related - Starts with number(To extract the next four words or upcoming space) For example: 05/24 as mentioned above.

How this can be done? .xaml please
As highlighted in bold above, i want to extract bold ones only as three separate columns and export it to excel

Thanks

Have a look on the following strategy:

  • mark the column end with a clear delimiter
  • parse the marked text to a datatable using the generate datatable activity

First marking iteration:

First marking iteration:

Feel free to play also with other partterns e.g.

(?<=TABS|TABLETS|CAPS|INJ|GEL|ML|’S|GM)|(?=\d{2}\/\d{2})

HI @GaneshPalaniappan

How about this expression?

Column 1

System.Text.RegularExpressions.Regex.Match(YourString,"\S.*(?<=TABS|TABLETS|CAPS|INJ|GEL)").Tostring

image

Column 2

System.Text.RegularExpressions.Regex.Match(YourString,"(?<=TABS|TABLETS|CAPS|INJ|GEL).*(?<=ML|’S|GM)").Tostring.Trim

image

Column 3

System.Text.RegularExpressions.Regex.Match(YourString,"\d*\W\d*$").Tostring.Trim

image

Regards
Gokul