Regex issues with studio

Hey guys, im getting frustrated with the regex implementation in the studio. using the Matches activity i’ve got the following string(extracted from a longer string):

Ange timpris Timpris i kronor
Ange timpris
715

I need to get the digits on the third line. using the pattern:
(?<=Ange\stimpris\sTimpris\si\skronor\sAnge\stimpris\s).+
in the regex builder, the correct digits are marked in grey. But when i try to assign the results from the match to a string using “stringvariable(0).tostring” nothing is found.

I’ve tried other patterns to that also show in the builder as if they are correct, but then when running the match it’s empty. Driving me mad that the regexbuilder thinks the match is correct but still it doesnt work. Also, using an external site like regexstorm.net doesnt work, matches that work there seldom work in the studio.

Any ideas whats wrong?

do acheck if multiline option is activated and do a check. Will have a closer look on it

what about following pattern \d+?

@thomasb82

Check as below

I’m able to fetch the result with your expression.

Check the scope of the variable and try

Also @ppr suggested you can also try that expression also

Hope this helps you

Thanks

weird, if i use +? in the builder, 715 is marked but when converting it to a string only 7 is shown.

how abaout this:

changed the scope and now it works using the “Ange timpris Timpris i kronor
Ange timpris
715”

string. However, im actually using a longer string with a ton of other stuff before this, using the “read PDF text” activity… Can’t show it since it’s super long. But still the builder only shows one match which is 715 when i use the full string. But then it finds no match when running the process. so weird. hm. Oh and note 715 can be any three digit combination.

@thomasb82
for finding out the root cause I would guess the windows line break \r\n

if this is incorporated in the pattern then it is working. Just check in debugger your orginal string on the occurences of \r

ah it looks like this viewing in debug(adding a few more lines in the beginning and after 715):
Testledare IT\r\n\r\nAnge timpris Timpris i kronor\r\nAnge timpris\r\n715\r\nTidsredovisning
The variable that i assign the result to get this value: CastIterator { }

so basically i need to incorporate this into the pattern?

Yes, it should get attention

have a look on the screenshot produced in UiPath Studio by debugging and working with the Watch panel. Here we do see that missing \r let fail the pattern.

ok so i’ve narrowed it down some. it seems to be the read pdf activity thats the issue. if i paste the entire document from the pdf into a variable and run it, it works. but when i set the source of the match to be the variable that comes from the get pdf text acitivity it the match fails… weird.

ok so i attached the process here. using the “pdfread” variable that comes from the read text file activity, the match fails. but using “pdfread2” which is the same string just pasted into a string variable, it works. cannot for the life of me figure out why?
testpdfread.zip (3.8 KB)

@thomasb82
had no chance to open the xaml
did you compared the different type of line breaks \r\n vs \n?
Maybe a missmatch is there

The string from the pdf doc looks like this:
"17805

Befattning Välj befattning i lista
DDDDSDSD IT

Ange timpris Timpris i kronor
Ange timpris
715
Tidsredovisning Ska tidredovisa men utan BAS-påslag
TDOK "

the string from the read pdf activity looks like this(from the watch panel):

"17805\r\n\r\nBefattning Välj befattning i lista\r\nTestledare IT\r\n\r\nAnge timpris Timpris i kronor\r\nAnge timpris\r\n715\r\nTidsredovisning Ska tidredovisa men utan BAS-påslag\r\nTDOK

and the variable that works(when i paste the string into it):

"17805\n\nBefattning Välj befattning i lista\nTestledare IT\n\nAnge timpris Timpris i kronor\nAnge timpris\n715\nTidsredovisning Ska tidredovisa men utan BAS-påslag\nTDOK

and the pattern im using is:
(?<=Ange\stimpris\sTimpris\si\skronor\nAnge timpris\n).+

in the regex builder it seems to work, 715 is marked, butwhen i run it it fails if i use the string from the pdf activity.

So there are a few differences in how \n and \r is used, but cant figure out the pattern for it to work on the read pdf string…

I only briefly looked at this thread, but noticed you had some newlines potentially being an issue.

I suggest placing those in brackets maybe.
[\r\n]{1,2}
for example, would catch variations with the newline characters.

EDIT: corrected to use backwards slash!

1 Like

Dude! That solved it.
using: (?<=Ange\stimpris\sTimpris\si\skronor[\r\n]{1,2}Ange timpris[\r\n]{1,2}).+

Man, thanks a lot both of you for your time!

Super annoying though that the regexbuilder in the studio thinks that both: (?<=Ange\stimpris\sTimpris\si\skronor[\r\n]{1,2}Ange timpris[\r\n]{1,2}).+

and

(?<=Ange\stimpris\sTimpris\si\skronor\nAnge timpris\n).+

works the same way.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.