Regex, extract data from string

Hello, I have a challenge with regex, I have 3 texts extracted from a pdf, I need to extract from each text universally the 09999999V or the X09999999V

Thanks!!

Text1:

Finibus Bonorum et Malorum

Lorem ipsum, consectetur adipiscing 09999999V 99 9999999999 28

Finibus Bonorum et Malorum

text 2

Finibus Bonorum et Malorum

Lorem ipsum dolor sit amet, consectetur adipiscing elit 09999999V 99 9999999999 28

Finibus Bonorum et Malorum

text 3

Finibus Bonorum et Malorum

Lorem ipsum dolor sit amet, consectetur adipiscing elit X09999999V 99 9999999999 28

Finibus Bonorum et Malorum

@RPASOFT - Assuming the number or the text ends with V always, Please check this pattern…

The letter v is an example, any letter can appear both at the beginning of that numerical string and at the end, I would also be able to take the following

Lorem ipsum, consectetur adipiscing 09999999V

Hi @RPASOFT

Please try the below regex and let me know if it meets your expectations.

Regex: (?=[A-Za-z]).(?=[0-9]).+[A-Z]|(?=[0-9]).+[A-Za-z]

Feel free to reach us at any time if you have any doubts. Thanks.

Happy Automation

This option works fine for me, but if I add some text, I get several matches, how can I select only match 2? Thanks

Could you please share those samples, and show us which one you specifically like to extract?

for example this text, get two matches, only need second, and i need works with this regex code

(?=[A-Za-z]).(?=[0-9]).+[A-Z]|(?=[0-9]).+[A-Za-z]

Text example

Rdr street 45 city

empl (name) D.N.I. Number S.S.

Finibus Bonorum et Malorum

Lorem ipsum, consectetur adipiscing 09999999V 99 9999999999 28

Finibus Bonorum et Malorum

Thanks for the sample. Quick question, does this text always before a two digit numbers or Any of texts are always constant?? based on the we can derive a pattern…

Hi @RPASOFT

Kindly confirm the below,

After this number their will be only Capital letter or it will come with both capital and small letter.

Thanks.

If the letter always in capital, then use below:

(?=[A-Za-z]).(?=[0-9]).+[A-Z]|(?=[0-9]).+[A-Z]

Or, try the below one:

(?=[A-Za-z]).(?=[0-9]).+[A-Z]|(?=[0-9]).+[A-Za-z].(?=[0-9])

Thanks

(?=[A-Za-z]).(?=[0-9]).+[A-Z]|(?=[0-9]).+[A-Za-z].(?=[0-9])

This works, but imagine that if there was more text, and it finds more matches … how can I uipath take only the first one?

Yes, you can get only the first match from Uipath. Please find below method.

wordMatch = System.Text.RegularExpressions.Regex.Matches(YourText,“(?=[A-Za-z]).(?=[0-9]).+[A-Z]|(?=[0-9]).+[A-Za-z].(?=[0-9])”)(0)

Note: wordMatch variable type should be in RegularExpressions.Match

Try this method and let me know if it meets your expectation. Thanks.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.