Using regex(replace action) to select row

Hey guys, hoping someone can help me figure this out.
I’m doing a read range from a rather messy excel file, in which i need to pick certain rows.
It looks like this(might have to max window to see it properly), each line is a row.
/===========================================================================
| HEROMA S I G N A L L I S T A Sida: 1 |
| DATABAS:TRV_PP01 PERIOD:2019-11 Alla Utskriven: 19-12-05 09:59 |
===========================================================================/

         Databas: TRV_PP01                               Löneperiod:        

         PATEAM  1001000a
       /­­­­­­­­­­­­­*­­­­­*­­­­­*­­­­­­­­­­­­­­­­­*­­­­­­*­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­\
       | Personnummer|IDNR |Lart |  From  -  Tom   |Felkod| Feltext                                           |
       |             |     |     |                 |      |                                                   |
       *­­­­­­­­­­­­­+­­­­­+­­­­­+­­­­­­­­­­­­­­­­­+­­­­­­+­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­*
       | 553445-7193 | 1000|     |                 | 5009 | Komptidsutbetalning via autolart kan ej göras     |
       | 553445-7193 | 1000|     |                 | 5009 | Kkjsdhfasdkfj jkhsdflkjhasdfh jkhasdfjkhsafsdj     |
       | 634521-7114 | 1000|     |                 | 5009 | Komptidsutbetalning via autolart kan ej göras     |
       | 534531-7114 | 1000|     |                 | 5009 | Komptidsutbetalning via autolart kan ej göras     |
       | 442332-5192 | 1000|     |                 | 5009 | random text yadda     |
       | 532984-5192 | 1000|     |                 | 5009 | random text yadda     |
       | 428374-4899 | 1000|     |                 | 5009 | random text yadda    |
       | 342493-4899 | 1000|     |                 | 5009 | random text random text     |

It goes on for thousands of rows ususally. as you see there are tons of spaces and | characters everywehere.

Now, what i want get out should look like this:
342493-4899 1000 5009 whatevertext is here
etc…

Or im ok with just removing the spaces and | characters so it look like this
342493-489910005009whatevertextishere
etc…

i’ve tried to get it to pick it out using the regexbuilder with the replace action, but not succeding.

any ideas on how to build the regex? let me know if i need to clarify it more.

thanks! Tom

Hi,

How about the following?

System.Text.RegularExpressions.Regex.Replace(strYourLine,"[\W\s]","",RegexOptions.Multiline)

Regards,

thanks for the reply. what should i put in the strYourLine? Cant seem to get it to work quite right…

Hi,

System.Text.RegularExpressions.Regex.Replace(strYourLine,"[\W\s]","",System.Text.RegularExpressions.RegexOptions.Multiline)

is for case of assign activity.

In Regex Builder of Replace activity, my suggestion is the following.

However, it might not fit for you because of removing some whitespaces in string data.

So, the following expression might be better.
(\||\s{2,})

Please set "" in replacement property.

Regards,

awesome, that worked flawlessly. big thanks. now, would you mind explaining what the different parts of (||\s{2,}) means? just to get a better understanding of the code. if not, totally cool :slight_smile:

Hi,

Sure,
First, perhaps you should get syntax : (A|B) means A or B.
In this case
A is \| and B is \s{2,}

\| means just ’|’. ‘|’ is special character in Regex, if you handle ‘|’ as a normal character, need to add \ in front of ‘|’.
\s{2,} means 2 or greater length of whitespaces.
‘\s’ means whitespace. ‘{2,}’ means 2 or greater.

Hope it helps you.

Regards,

aaah thanks man, really appreciate this!

1 Like