Having problem with regex based extractor

Hi everyone,
I am having problem with with regex expression. I have three different pdf documents. I need to extract some data. Data is changed based on requirement. That’s why I am using regex based extractor to extract the data if available.
First, I need to extract weight data (only value i.e 2.0/2.2/1.5). I have written regex expression for that. But it doesn’t work.

Weight (kg) : 2.0 Kg
Wt. (kg) : 2.2 Kg
Weight(kg) : 1.5 Kg

Wt.|Weight\s*(kg)\s*:\s*(\d*.\d*)\s*Kg

Second, I need to extract length, width and height data (only value) separately. But I am confused about the regex expression because data is given in different format. How can I write?
For example:

100 / 105 (length)

100 / 85 (width)

150 / 150 (height)

image

image

Regards,
Ekram

For extracting the weight:

(?<=(Weight|Wt\.) ?\(kg\) : )\d+\.\d+(?= Kg)

I tried your solution. Unfortunately, it did not work. Because you consider only one whitespace between the data. But for some cases, there is more than one whitespaces in my dataset.

You can just modify the regex and add + after the spaces to include those cases.

(?<=(Weight|Wt\.) *\(kg\) +: +)\d+\.\d+(?= +Kg)

1 Like

It worked. Thanks.

Anyway, I solved the issue. I used if condition to separate the pdf invoices and then I used regex based extractor to extract the data.

Regards,
Ekram

Can you please help me to learn regex

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.