[REGEX] - Capture Group expression works in .NET Regex Tester

Hey everyone,

Once again I face the issue where a certain REGEX expression captures the intended output in .NET REGEX Tester but not in UiPath.

I made sure to escape the \r and \n but still nothing.

Here is my link to .NET Regex Tester. The expression should capture 4 groups, witch correspond to departures and arrivals:

http://regexstorm.net/tester?p=(%3F<%3DO\%2FD\\r\\n).*%3F(%3F%3D\%3B|\\r\\n|%24)&i=FATURA\r\nORIGINAL\r\nNº+DOC.+DATA+PÁG.\r\nAv.+Infante+D.+Henrique%2C+73%2C+1º+ZF2E+1%2F1190007073+30-11-2021+1+%2F+54\r\n1900-263+Lisboa+CLIENTE+CONTRIBUINTE\r\nTelefone%3A+21+10+21+040+111527+PT500297177\r\nTelefax%3A+21+10+21+029\r\n\r\nVIAGENS+ABREU+SA-PORTO\r\nPRAÇA+DA+TRINDADE+142%2C+4º\r\n4000-539+++PORTO\r\n\r\nVossa+Referência%3A+NOV%2F2021-PVE+Galileo+NPC+Documento+interno+%3A++2210705158\r\nData+de+Vencimento%3A+30-12-2021\r\nCondições+de+pagamento%3A+A+30+dias+da+data+do+documento+(Unidade%3AEUR)\r\nDescrição+Quant.+Valor+Unitário+Valor+Líquido+IVA\r\n\r\n10+Transporte+de+Passageiros+-+Av´s+1+58%2C49++++++++++58%2C49+6%\r\nNº+Bilh+3729-121577+%3B+D.+Venda+22-11-2021+%3B+BS+LC%2FRG+%3B+O%2FD\r\nBraga%2FLisboa+Oriente+%3B+FS+TF+3729%2F125238\r\n\r\n20+Transporte+de+Passageiros+-+Av´s+1+42%2C08++++++++++42%2C08+6%\r\nNº+Bilh+3729-121516+%3B+D.+Venda+22-11-2021+%3B+BS+LC%2FRG+%3B+O%2FD\r\nLisboa+S.+Apolónia%2FPorto+Campanhã+%3B+FS+TF+3729%2F125177\r\n\r\n30+Transporte+de+Passageiros+-+Av´s+1+30%2C09++++++++++30%2C09+6%\r\nNº+Bilh+3729-121521+%3B+D.+Venda+22-11-2021+%3B+BS+LC%2FRG+%3B+O%2FD\r\nPorto+Campanhã%2FEntrecampos+%3B+FS+TF+3729%2F125182\r\n\r\n40+Transporte+de+Passageiros+-+Av´s+1+48%2C58++++++++++48%2C58+6%\r\nNº+Bilh+3729-119794+%3B+D.+Venda+29-10-2021+%3B+BS+LC%2FRG+%3B+O%2FD\r\nPorto+Campanhã%2FLisboa+Oriente+%3B

Any help would be greatly appreciated.

Thank you.

@andre.f.pires

It’s capturing correctly in UiPath, check below

can you tell me what issue you are facing?

Thanks

Hey Srini84, thanks for replying.

Well, my sequence is as follows:

First, I’m reading the PDF file using Read PDF Text activity, outputting the result to the string output.

Then, assign mc, of MatchCollection type = System.Text.RegularExpressions.Regex.Matches(output,"(?<=O/D\r\n).*?(?=;|\r\n|$)")

Then, inside a for each I’m logging each item of mc, witch is empty, implying that nothing is being captured.

Printscreen of sequence:

Thank you.

it looks like e.g. the Line break \n indicator flag is mixed up with the textual representation and escaped value \\n. within prototyping in the Regex tester. Same also for the \r




@andre.f.pires

Check as below

It’s working, Make sure that the For Each TypeArgument is set to System.Text.RegularExpressions.Match

image

Hope this will help you

Thanks

That’s weird.

When you assign the entire body to a string it does work, but if you obtain the string from Read PDF Text activity it no longer works, I double checked.

Assign the output to variable, and it matches the values intended:

Get the body through the Read PDF Text activity and it doesn’t match anything:

What could be wrong?

@andre.f.pires

You can compare the string, which is coming from Read PDF text activity and the string which Regex is working

Maybe it will help

Thanks

They are the same.

I’m getting the PDF string by logging the Read PDF string output to a Log Message activity, witch is this:

FATURA\r\nORIGINAL\r\nNº DOC. DATA PÁG.\r\nAv. Infante D. Henrique, 73, 1º ZF2E 1/1190007073 30-11-2021 1 / 54\r\n1900-263 Lisboa CLIENTE CONTRIBUINTE\r\nTelefone: 21 10 21 040 111527 PT500297177\r\nTelefax: 21 10 21 029\r\n\r\nVIAGENS ABREU SA-PORTO\r\nPRAÇA DA TRINDADE 142, 4º\r\n4000-539 PORTO\r\n\r\nVossa Referência: NOV/2021-PVE Galileo NPC Documento interno : 2210705158\r\nData de Vencimento: 30-12-2021\r\nCondições de pagamento: A 30 dias da data do documento (Unidade:EUR)\r\nDescrição Quant. Valor Unitário Valor Líquido IVA\r\n\r\n10 Transporte de Passageiros - Av´s 1 58,49 58,49 6%\r\nNº Bilh 3729-121577 ; D. Venda 22-11-2021 ; BS LC/RG ; O/D\r\nBraga/Lisboa Oriente ; FS TF 3729/125238\r\n\r\n20 Transporte de Passageiros - Av´s 1 42,08 42,08 6%\r\nNº Bilh 3729-121516 ; D. Venda 22-11-2021 ; BS LC/RG ; O/D\r\nLisboa S. Apolónia/Porto Campanhã ; FS TF 3729/125177\r\n\r\n30 Transporte de Passageiros - Av´s 1 30,09 30,09 6%\r\nNº Bilh 3729-121521 ; D. Venda 22-11-2021 ; BS LC/RG ; O/D\r\nPorto Campanhã/Entrecampos ; FS TF 3729/125182\r\n\r\n40 Transporte de Passageiros - Av´s 1 48,58 48,58 6%\r\nNº Bilh 3729-119794 ; D. Venda 29-10-2021 ; BS LC/RG ; O/D\r\nPorto Campanhã/Lisboa Oriente ; FS TF 3729/123455\r\n\r\n50 Transporte de Passageiros - Av´s 1 42,08 42,08 6%\r\nNº Bilh 3729-121517 ; D. Venda 22-11-2021 ; BS LC/RG ; O/D\r\nLisboa S. Apolónia/Porto Campanhã ; FS TF 3729/125178\r\n\r\nTOTAL A TRANSPORTAR\r\n\r\nTotal Líquido: 221,32\r\n\r\nKdOk - Processado por programa certificado Nº 631/AT\r\n\r\nSede: Calçada do Duque, 20 1249-109 LISBOA / Telef: 351.211023000 FAX: 351.211023411\r\nContribuinte Nº 500498601 / Capital Social: € 3 959 489 351,01 / RegistoC.R.C.L. Nº 109 / Internet http://www.cp.pt

The only difference being when you assign this string directly to a variable, then the REGEX expression works.

I am getting the exact output from uipath as well.
May you please help me with your code.

@andre.f.pires

Can you try with this below expression?

(?<=O/D\n).*?(?=;|\n|$)

image

Hope this may help you

Still not working, then can you share a sample pdf file?

Thanks

Here’s a snippet of the code witch reads the PDF file and applies the REGEX expression.
readPdf.zip (70.9 KB)

@andre.f.pires

Try with below expression

(?<=O/D\s+).*?(?=;|\n|$)

image

This will help you

Thanks

Thank you @Srini84, you’re a lifesaver.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.