Need help at a regular expression

Hi,
sorry for offtopic, but I’m getting mad with this regular expression…

I want to parse a PDF with UiPath. The PDF is a purchase order with different positions:

It looks like this

18 123456 25,00 Stck 100,00 2.500,00 EUR

.
.
.
some text

Ihre Art.-Nr. 1690431

Liefertermin: 21.11.2019

.
.
.
some text

incl.Mindermengenzuschlag

entspricht: 222,00 EUR

the problem is that I need two optional groups the “Liefertermin” (delivery date) and the “Mindermengenzuschlag” (extra charge for small amounts) are optional

This is my RegEx so far:

\n1\s(\d{2,8})\s(\d{0,3}(.\d{3}),\d)\s(\w{1,10})\s(\d{0,3}(.\d{3}),\d)\s(\d{0,3}(.\d{3}),\d)\s(\w{3}).+?Ihre Art.-Nr.\s(\d+).+?(Liefertermin:\s(\d{2}.\d{2}.\d{4}))+?.+?(Mindermengenzuschlag.+?entspricht:\s(\d{0,3}(.\d{3}),\d)\s(\w{1,10}))+?

This doesn’t work if the optional values are not there - of course

imo it shoul go like this:

\n1\s(\d{2,8})\s(\d{0,3}(.\d{3}),\d)\s(\w{1,10})\s(\d{0,3}(.\d{3}),\d)\s(\d{0,3}(.\d{3}),\d)\s(\w{3}).+?Ihre Art.-Nr.\s(\d+).+?(Liefertermin:\s(\d{2}.\d{2}.\d{4}))?.+?(Mindermengenzuschlag.+?entspricht:\s(\d{0,3}(.\d{3}),\d)\s(\w{1,10}))?

but then it doen’t match the optional values at all…

Can anyone help?

@Dave.Remmel - This video might help you to parse the PDF using Regx Using Regular Expression Extract Data from Multiple files (PDF & Word) and Convert into Excel file - YouTube

The UIPath-Part is working fine. I just have problems with the regular expression

…anyone, who could can help me?

@Dave.Remmel Do you need to Access the Values using One Regular Expression, it might be easier to access the Values if there are two regular expressions to access each of your value, Since you have two values to Extract. Also Can you Highlight what are the exact values you want ? When you say it’s optional, that means it might be present or not. In cases when it might not be present , What do you want to do ??

Actually I do that… I loop through the order positions to match only one position at a time.

Yes. I need the values, if they are present…

Here in bold the values I’d like to extract:

18 123456 25,00 Stck 100,00 2.500,00 EUR

.
.
.
some text

Ihre Art.-Nr. 1690431

Liefertermin: 21.11.2019

.
.
.
some text

incl.Mindermengenzuschlag

entspricht: 222,00 EUR

@Dave.Remmel Check these links :

You can use the regex using this Expression for the Date value :
System.Text.RegularExpressions.Regex.Match(yourInputString,“(?<=Liefertermin:).*”).Value

Likewise you can use the other regexes in the same way to extract the values.

If you find this useful, we’ll work on extracting the first line Statement, Is there any keyword to identify it uniquely, Like is the “Stck” Word always constant for that value ?

Yes I’m using these… Thx anyway

The first line is working fine… only the optional values are the problem…

@Dave.Remmel Were you able to get the Regex, I was able to get 18 matches but those 18 matches doesn’t include the entspirch value, even though it is present.

I’ve got it working yesterday late in the evening. Got a hint on Stackoverflow:

"\n"+ positionINT.ToString() + "\s(\d{2,8})\s(\d{0,3}(\.\d{3})*,\d*)\s(\w{1,10})\s(\d{0,3}(\.\d{3})*,\d*)\s(\d{0,3}(\.\d{3})*,\d*)\s(\w{3}).+?Ihre Art.-Nr.\s(\d+).+?(?:Liefertermin:\s(\d{2}.\d{2}.\d{4}).+?)?(?:Mindermengenzuschlag.+?entspricht:\s(\d{0,3}(\.\d{3})*,\d*)\sEUR)?"

I just moved the .+? into the group

So instead of this:

(?:Liefertermin:\s(\d{2}.\d{2}.\d{4})).+?

I use this

(?:Liefertermin:\s(\d{2}.\d{2}.\d{4}).+?)

Still don’t unterstand why the first version didn’t work… But well… It works.

Thanks a lot

@Dave.Remmel It doesn’t extract the Liefertermin and entspricht Value though :thinking: