Possible Bug with Regex Based Extractor

JosephNehl · July 22, 2020, 7:37pm

Hello all,

I have been experimenting with the regex based extractor inside component in the Intelligent OCR Package. I have run into a strange occurrence while using different regex patterns. I noticed that the Regex pattern would not be correctly found in certain cases where there is a group1. For example using

(\d|\.)*(?=(\s|\n)*PLEASE)

on

"143.05

PLEASE PAY FROM

CUSTOMER"

returns a full match of “143.05”
a group1 of “5”
and a group2 of “”

The regex based extractor for some reason always takes the group1 match, not just in this example but in the other expressions I have tried as well. I can work around this by changing my regex to include extra parentheses around the area I want to capture like such:

((\d|\.)*)(?=(\s|\n)*PLEASE)

But this feels more like a band-aid than a real fix. If this was in a matches activity, I could specify the exact group and match I wanted to make, but I did not see any option for this in the extractor. I am wondering if I missed something and there is someway to specify which group you wish to extract and also wondering if what I am running into is the intended functionality. Thank you all for any insights!

Pratik_Wavhal · July 22, 2020, 7:47pm

Hi @JosephNehl

Try replacing * to + as shown below

You might extract the data

Mark as solution and like it if this helps you

Happy Automation

Best Regards
Er Pratik Wavhal

JosephNehl · July 22, 2020, 7:53pm

Thank you for the input, I will try and and see if that option works as well.

However, it’s not that I can’t find a workaround to include the full match in group 1. My questions is if there is any way to specify the group you wish to capture using the new Regex Based Extractor in the Intelligent OCR Packages, and if taking the group 1 match instead of full match is the intended functionality.

Topic		Replies	Views
Regex Extractor not extracting proper values AI Center question , ai_center	3	512	December 21, 2022
Intelligent OCR Regex Based Extractor Not Returning Values Document Understanding	21	4273	December 21, 2022
Matches Activity Works but Regex Based Extractor with Same Expression Not Working Document Understanding question , document_understanding , regex-extractor	5	366	August 29, 2023
Having problem with regex based extractor Activities ocr , regex	7	713	May 29, 2022
Regex Based Extractor Help activities , regex , question	5	1144	January 6, 2020

Most Active Users - Yesterday
Anil_G
ashokkarale
jinal.shah
Gautham_Pattabiraman
postwick
chandreshsinh.jadeja
vrdabberu
Ajay_Mishra
sven.wullum1
Vyshnavi_Nalumachu
More details...

Possible Bug with Regex Based Extractor

Related Topics