Possible Bug with Regex Based Extractor

Hello all,

I have been experimenting with the regex based extractor inside component in the Intelligent OCR Package. I have run into a strange occurrence while using different regex patterns. I noticed that the Regex pattern would not be correctly found in certain cases where there is a group1. For example using

(\d|\.)*(?=(\s|\n)*PLEASE)

on

"143.05

PLEASE PAY FROM

CUSTOMER"

returns a full match of “143.05”
a group1 of “5”
and a group2 of “”

The regex based extractor for some reason always takes the group1 match, not just in this example but in the other expressions I have tried as well. I can work around this by changing my regex to include extra parentheses around the area I want to capture like such:

((\d|\.)*)(?=(\s|\n)*PLEASE)

But this feels more like a band-aid than a real fix. If this was in a matches activity, I could specify the exact group and match I wanted to make, but I did not see any option for this in the extractor. I am wondering if I missed something and there is someway to specify which group you wish to extract and also wondering if what I am running into is the intended functionality. Thank you all for any insights!

Hi @JosephNehl

Try replacing * to + as shown below

You might extract the data

Mark as solution and like it if this helps you :slight_smile:

Happy Automation :raised_hands:

Best Regards
Er Pratik Wavhal :robot::man_technologist:t4: :computer:

1 Like

Thank you for the input, I will try and and see if that option works as well.

However, it’s not that I can’t find a workaround to include the full match in group 1. My questions is if there is any way to specify the group you wish to capture using the new Regex Based Extractor in the Intelligent OCR Packages, and if taking the group 1 match instead of full match is the intended functionality.