Regex help capturing total amount

Hello,

I have the following sample text:

I alt 1.644.700,56

Certificeret iht. DS/EN ISO 9001, DS/EN ISO 14001 og DASt Richtlinie 022

28.02.2022 301111269 Faktura 20.03.2022 11.597,05
Saldo 11.597,05
Specifikation af aktivposter
28.02.2022 301111269 Faktura 20.03.2022 11.597,05

Årets køb
25.666,39
Saldo vor favør
11.597,05
Total forfalden
0,00
Forfalden senere
11.597,05

Transport-saldo…: 1.760,24

28-02-2022 3798187 3798187 20-03-2022 1.888,83

Saldo 1.888,83
Specifikation af aktivposter

Forfalden saldo 0 til 30 dage
0,00
Forfalden saldo 31 til 60 dage
0,00
Forfalden saldo over 60 dage
0,00

31-jul.-2021
31-jan.-2022
09-feb.-2022
15-feb.-2022

Faktura #6000017609
30-aug.-2021
02-mar.-2022
11-mar.-2022
kr 6.751,50
kr 12.383,41
kr 6.956,93
kr 9.240,25

03-02-2022 Payment 03-02-2022 DKK 0,00 18.573,98 5.042,05
21-02-2022 3422350 Faktura 30-03-2022 DKK 2.064,50 0,00 7.106,55
28-02-2022 Ultimo DKK 7.106,55

Poster DKK
Primosaldo 3.584,38
25-02-22 86754 Bank Indbetaling 25-02-22 -3.584,38 0,00 0,00
28-02-22 3527792 Faktura 20-03-22 3.584,38 3.584,38 3.584,38

Ultimosaldo I alt DKK 3.584,38

28-02-22 1364015 Faktura 1364015 30-03-22 25.822,50 25.822,50 248.138,11
28-02-22 1364051 Faktura 1364051 30-03-22 10.679,00 10.679,00 258.817,11
I alt DKK 258.817,11

09-02-2022 85326041 FA 85326041 D 58191020 28-03-2022 1.221,64
28-02-2022 UltimoSaldo os tilgode 1.221,64

Beløb Skyldig
kr 64.929,20

And this regex: \s(saldo|i alt|ultimo|Beløb skyldig).{0,19}?\n?([0-9.,-]+.,\d*)

Everything bold matches correctly except last one:

Beløb Skyldig
kr 64.929,20

Can anyone figure out why that is?

EDIT: Might be easier with this link: Regex storm

HI @MaxDS1

Try this Regex Expression

System.Text.RegularExpression.Regex.match("InputString","(?<=I\salt\s)(\d.+)|(?<=Saldo\s)(\d.+)|(?<=Saldo\svor\sfavør\n)(\d.+)|(?<=Ultimosaldo\sI\salt DKK\s)(\d.+)|(?<=I\salt\sDKK\s)(\d.+)|(?<=Beløb\sSkyldig\nkr\s)(\d.+)").Tosting

Regards
Gokul

1 Like

HI @MaxDS1

If you need the whole text try this expression

System.Text.RegularExpression.Regex.match("InputString","(I\salt\s)(\d.+)|(Saldo\s)(\d.+)|(Saldo\svor\sfavør\n)(\d.+)|(Ultimosaldo\sI\salt DKK\s)(\d.+)|(I\salt\sDKK\s)(\d.+)|(Beløb\sSkyldig\nkr\s)(\d.+)").Tosting.Trim

Regards
Gokul

1 Like

Hei @MaxDS1,

In short the last one has kroner “Kr” one the newline before it starts with digits. So your pattern was not capturing the last one.

Similar to your anchor search with pipes, you can use the | to look for two different types of patterns. Basically an OR logic.

\s(saldo|i alt|ultimo|Beløb\sskyldig).{0,19}?\n?(kr [\d*.\d*]+.,\d*|[\d*.\d*]+.,\d*)

From Regex101 (I like using this website since it provides group matches and explanation of the pattern)

One more thing I noticed was “ultimo” and “i alt” anchors will extract only a part of the anchor text in group 1. See the full match and Group 1 match

1 Like

Hi,

Another solution:

The following will work.

mc = System.Text.RegularExpressions.Regex.Matches(yourString,"(?<=^|\s)(saldo|i alt|ultimo|Beløb skyldig)[\s\S]*?([.,\d-]+)(?=\s*\n|$)",System.Text.RegularExpressions.RegexOptions.IgnoreCase)

Sequence.xaml (6.9 KB)

Regards,

1 Like

Hi,

Thank you so much for the answer. I can’t believe I missed that part about newline starting with something else than digits. Doh.

I have two questions more for you;

Why did you change this part:

[0-9.,-]

into this part:

\d*.\d*]

Is that not basically the same thing or am i missing something? :slight_smile:

Concerning the last part, I’m not sure I totally understand. The end goal for me is to get the amount (e.g. 3584,38) - the text is only used to find the correct number. Or am i misunderstanding you? :slight_smile:

Thanks again for all the replies everyone! :slight_smile:

Just a personal preference to keep the pattern readable and short and formatting of currency will still remain same . and , as you have.

May be there are other patterns and that is why you used “{0,19}?\n?([0-9.,-]+.,\d*)” Do try both approaches and check with some of your string possibilities to check if both return the value you are interested in.

Sure but if you want to make some other logic in your process with Ultimo group (Group 1) and you want to know if the group belongs to Ultimo or Ultimosaldo I alt. Just an headsup I dont think it will affect your current implementation, but seeing only part of the anchor text is not what I am used to.

When I say anchor text, I am refereeing to any one of these “Saldo|I alt|Ultimo|Beløb\sSkyldig”

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.