Split string with respect to REGEX in the text file

@SamanGuruge

Please find the screen shot of the results that I am getting:-

As you can see, it is also highlighting the other characters which are not needed.

Also as last line has no (\n) character, the regex in not acknowlegding it correctly.

Please please fix the above two cases?

Regards,
@hacky

@mukeshkala, can you please help?
@Arpit_Kesharwani
@Karthik_Kulkarni

1 Like

@hacky Is it what you are looking at
https://regex101.com/r/iUMDDn/2

1 Like

@Arpit_Kesharwani

Thanks this is what I was looking for.

I changed your regex to (|[a-zA-Z0-9]{3}$) and its working fine.

Your approach gave me idea,

1 Like

@Arpit_Kesharwani, @supermanPunch, @Palaniyappan

Can you please have this text file Test_File.txt (4.2 KB) and use the regex to separate the input with respect to the following regex (|[a-zA-Z0-9]{3}$), such as to get the lines as mentioned above in current post thread?
https://regex101.com/r/xst7kl/1

I am not getting the correct results when I am using the string split in the Uipath with respect to regex. But if you look at the provided link, you will see that regex is working fine, but when looking at the UiPath regex viewer, the results are not correct

Regards,
hacky

1 Like

@hacky Do you want the other Line (Starting From || ) Which is a Part of the Previous Line to be Appended back to that Previous Line ?

@supermanPunch

Yes, that is the intent of doing the string split with respect to regex,

the regex is simply as discussed in this post.

Thanks in Advance

Regards,
@supermanPunch

@hacky Do you need the output as a Datatable? Why is the Regards Mentioning Me :rofl:

Ops, sorry mate!!! :smile: :smile: :smile: :rofl: :rofl: :rofl: :joy:

I was in a hurry and my fingers slipped!

Anyways you got it right, but then that is the factor where more data massage is needed, which is already ready in my hand.

And for now I just need this module where we can make robot make understand that what is the last field of each line in the input file (string variable), I need to use regex to make robot understand the last field and divide the string with respect to regex match and keep appending in the new string variable (lets say strNewResult). Now this module is only concerned with this much steps. So Yeah, if regex can determine the last field, then we can keep appending the records in strNewResult and we will be good to go.

I hope you understood my intensions.

And once we are getting strNewResult in our hand, I am having different loop to work out the further execution. This is already there in place.

Thanks and Regards,
@hacky (this time its correct mate…lol)

@hacky If Every Line which needs to be appended to the Previous Line is Starting with a | Symbol then I think I already have the Solution.
Check this Workflow, and revert back if it is not the Expected Output.
Text_To_Datatable.zip (18.4 KB)

@supermanPunch

Thanks, this looks good!

But then there is one challenge.
Every line which needs to be appended to the previous line is not necessarily starting from | in real time scenarios. Thats the reason why I stopped worrying about the STARTING LINE and started worrying about using regex to determine the last field and using it as a separator.

As you see the input text file can be messed up in any ways, and only pattern I was able to come up with is that last field has this pattern such that (|000 and new line) where

last field will be “|CA0”, “|000”, “T09”, etc followed by new line (if its last line then there is no new line charactor).

So the regex pattern I was able to understand was for last field: starting as “|” pipe, followed by three Alpha numerics and then a new line chatactor(expect if its last line).

So the regex which I was able to come up with was :- (|[a-zA-Z0-9]{3}$)

And I am worried about how can I use this regex as a separator of the information.

Please give your inputs on this.

Thanks and Regards,
@hacky

@hacky Actually I was working on this from the point you have put that post, it was kind of a challenging task :sweat_smile: , But I was not able to use the Regex to get the Data as needed. Instead I used to lists to Check the Data by Splitting and adding the || line to the Previous line.

But seems to me it looks possible using regex considering the Starting Field will have that pattern of Month and year but there was some blockage while implementing the regex way. I will post that regex for you to understand it and also make changes to it to get the needed output.

@supermanPunch

I know right? The input text files are really messed up and its really hard to find the pattern (some input files have 10,00,000+ lines, so finding the trouble maker line and then re work on the logic is very tidious job)!
I have my project ready and user to able to consume this process, the only challenge we are facing is (WHAT IF) scenarios, which is keeping me engaged in finding more options to achieve this.

Also regarding the starting of line being first three charactors to be month, followed by year, even that is not fixed, it can be AJD1-19 , ADJ2-17, or any such text showing the record to be (ADJUSTMENT record on that year)… So as you see it makes it even harder to pin down the exact variations…lol :joy: :rofl: Thats why I was more interested in Last part instead of starting part of the line…

I am just trying to explore more and try some more luck. Please get back to me if you find something of my interest.

Thanks very much for your inputs and help. Really appreciate it!

Regards,
@hacky

1 Like

@hacky Maybe the Columns will be of fixed Size :thinking: , those separators “|” we might be able to use that as a reference if it is fixed :sweat_smile:

@supermanPunch

Guess what!! Those separators are comming from the oracle application and user is entering some fields in the application which are human entered and hence they are using | from their end on any fields which makes | pipe chareactors more than expected in few lines. So yeah, cant really use number of pipes to count the count.

Infact I did that before but turned out that human entries are always comming in the way.

There are 80+ users to this process globally in different countries so cant really control user from misusing the separators. They are going to be there all the time.

@hacky But if those separators are not properly used for the Columns of different rows there will be a mismatch in the Column Values, So it Should be of a fixed number. If not then I guess the data is not in the proper format :sweat_smile:

@supermanPunch

We, as RPA develpors had to fix that!!..LOL

So I am able to group the columns where we know the PIPE charactors are not misused and we are using group 1(first 10 columns) and group 3(last 3 columns), where we know data is computer generated so it will be fine . and rest all the data in group 2 will be merged and placed in 1 single column.

So total of 14 columns = group 1(10 col) + group 2 (1 trouble maker column) + group 3 (3 col)

SO this is what is happening in the next loop after we are able to fix the current module of bringing all the rows in one lines.

I have used the logic of keep appending the next lines to the previous lines till there are 16 or more pipes and then running next loop where I make sure there are only 14 columns.

Things are in place, but we are worried about WHAT IF cases. And I wanted to give regex a try, I was half way there but still need to explore more on that.

If I will be able to cut down the lines with respect to last fields then it will be a huge progress in my findings,…

2 Likes

Hi @hacky,

I saw your input text file i think it was done previously same requirement.If i am wrong please provide exact requirement with screen shots we will look in to it

Regards,
Omkar P

with myText as your text file content, the following will give you a Match for each occurence. You can tweak the pattern to work directly with groups on each Match. As far I can tell from your example, trimming values is mandatory: you could make a first sanitization pass with Regex.Replace(myText, "\s*\|\s*", "|").

import System.Text.RegularExpressions

Assign (String)
myPattern = ^(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)-\d{2}.+?\|000$

Assign (RegexOptions)
myOptions = RegexOptions.Multiline Or RegexOptions.Singleline

Assign (MatchCollection)
myMatches = Regex.Matches(myText, myPattern, myOptions)

Below, see the pattern in action
https://regex101.com/r/hLlGm2/1

3 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.