Regex Based Extractor

Hey Guys :),
i’m facing a problem, it’s my first time working with regex. can you give me some tips?

I use the regex based extractor and have the following text:

HK - Allgemeinanteil 359,25 33,00 Ant. 10,886364 1,00 Ant. 10,89 20% 2,18
HK - Energieabgabe 264,27 2.143,02 m² 0,123317 93,52 m² 11,53 20% 2,31
HK - Gebrauchsabgabe 28,11 2.143,02 m² 0,013117 93,52 m² 1,23 20% 0,25
HK - Grundpreis 6.083,69 2.143,02 m² 2,838840 93,52 m² 265,49 20% 53,10
HK - Messpreis 530,58 2.143,02 m² 0,247585 93,52 m² 23,15 20% 4,63

this is ocr data from a PDF

This data is written into a txt file from where it is to be processed further in SAP.

Now my problem I need only one column.

HK-STRING(Column) and the column where 10,89…11,53…1,23…265,49------23,15

I need these 2 columns for further processing

There can be a maximum of 4 places and 2 after-commercial.

can you please help me?

Best Regards Chris

Can you share the pdf with me?

Hi,

Can you try the following expressions?

System.Text.RegularExpressions.Regex.Matches(strData,"HK - .*?(?=\s)")

System.Text.RegularExpressions.Regex.Matches(strData,"[\d,]+(?=\s\d+%)")

I’ll attach sample fyi as the following.

Sample20200102b.zip (13.9 KB)

Regards,

Hi @bertc,

Split the whole string output through newline and iterate it,

use the following regex pattern to identify the column 1 and column 2 values,

Regex pattern for HK String: “HK - \w+”
Regex pattern for Column 2: “\d+,\d+(?= \d+%)”

Thanks!

1 Like

Thank you it works! <3

Best Regards Chris

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.