Intelligent OCR Regex Based Extractor Not Returning Values

btc653 · March 11, 2020, 3:44pm

I followed the Document Understanding directions (set up taxonomy, load taxonomy, digitize document, classify document, data extraction, and present validation), but I cannot get the Regex Based Extractor to match any values. I’ve configured the expressions and tested that they work, and I’ve configured the extractors to check the boxes for the fields that should be extracted from the Regex Extractor. But when the validation station pops up, it says all values were not extracted. Exporting the extraction results shows the same–nothing pulled out. I’ve done this with position-based and ML extractors and haven’t had issues. What could I be missing?

bcorrea · March 12, 2020, 1:01pm

Hi welcome to the community!
Is there anything more detailed that you can share with us?

btc653 · March 12, 2020, 8:57pm

Here’s several screenshots of the process.

I have the ML extractor and Regex extractor inside the Data Extraction Scope:

I input all of the variables to this scope activity from the previous steps according to the Doc Understanding instructional video:

I set the Configure Extractors to take NAIC ID from the Regex Extractor:

Here’s the Regex expression with the actual text that’s being read by the OCR as the sample, and there is in fact a match:

Why is there still no value extracted from the Regex portion?

preetith · March 16, 2020, 1:48am

any updates on this ?

btc653 · March 16, 2020, 1:37pm

I have no solution yet. Are you having the same issue?

preetith · March 17, 2020, 1:02am

Yes, I am noticing this issue. If there are multiple extractors, I see only the first one is working.

btc653 · March 17, 2020, 1:00pm

Could you get regex to work being the only extractor? That was failing for me as well.

I’m thinking of trying a workaround of doing the regex Matches activity on the DocText string, and then feeding the result into the ExtractionResults variable. Not sure if that’s easily modifiable, though.

preetith · March 18, 2020, 1:09am

Let me know if it works.
I will try extractor at a time and publish my results here as and when I complete

amithvs · March 24, 2020, 5:38am

Hi, I am also facing same situation. followed the same steps but still during present validation station I have to manually select items always

btc653 · March 24, 2020, 6:53pm

I don’t think my idea is feasible to edit the ExtractionResults variable and add the Matches text.

I’d like to hear from a UiPath person who can speak to the issue with the Regex Based Extractor.

amithvs · March 25, 2020, 3:57am

@alexcabuz It will be much appreciated, if you can take a look at this.I am also facing the exact same issue when using multiple extractors. My regex when tested separately it’s a match, but when during present validation station no data is getting extracted.

marios · March 30, 2020, 8:20pm

@btc653 I’m having this same issue and couldn’t edit the extraction results either. I got around it by exporting the extraction results into a dataset and then going through the tables and filling in any missing values using the matches activity. However this makes the Regex extractor useless so it’d be good a hear back from someone on support on whether it’s user error or just some delayed feature fixes.

For the record I’d also like to add that I can’t get the regex extractors to work by themselves or with other extractors, but the regex wizard does confirm a match (I’m at the point where I just match a word literal to try and get something in the results).

Niels1 · April 10, 2020, 2:31pm

Having the same issue, I am using only 1 extractor. My workflow is

load taxonomy
digitize document (using Tesseract OCR)
classify document scope, with keyword based classifier
data extraction scope, with regex based extractor

Classification is working (document is correctly classified as invoice as defined by my taxonomy), but the regex based extractor is not returning anything. The regular expressions themselves work on test data.

UPDATE: solved, I didn’t know you have to indicate “capture” for every regular expression you make!

marios · April 13, 2020, 6:04pm

Hey Niels,

Just wanted to update you on something, I figured out that my extraction wasn’t working because I wasn’t surrounding my regex with a capture group. The regex extractor returns every captured group as an option in the extracted results, so keep that in mind.

Turns out in the regex extractor instructions it does specify that you need to surround them in a capture group but I must have glossed over it.

VanjaV · April 16, 2020, 9:37am

Hi marios, please, would you like to explain it more in detail?
I would like the extracted data to be prepared for human validation more detailed than
with predefined ML extractors. Eg. address to be further split into street, street nr., …
I already have these fields predefined in taxonomy manager.
Regards,
Vanja

Ioana_Gligan · April 20, 2020, 5:02am

Hello all,

Sorry for the late reply!

Please have a look over this: How to use the IntelligentOCR Package - it might help in seeing how the regex based extractor needs to be configured for both a simple field as well as a table field

Have fun,

Ioana

VanjaV · April 29, 2020, 7:47am

Hi, I got the solution from other source:

extractionResults.ResultsDocument.Fields.First(Function(i As ResultsDataPoint) i.FieldName = “Street”).Values(0).Value = vcStreet

Regards,
Vanja

mkiepal · June 16, 2020, 6:42am

Hello, did you resolve issue?

Maciej_Witos · July 28, 2020, 1:05pm

I had the same issue - I’ve used two diffrente extractors:
Intelligent Form Extractor
Regex Based Extractor
and Regex Extractor didn’t give me any results.
The solution is to round your RegEx with the parentheses. After that, all is working

Kesavaraj_K · August 9, 2020, 8:13pm

Hi guys, Solution actually is that Regex Editor works completely different from how we been matching with VB methods.

You have to match code until your sample text is highlighted with yellow background (gray highlighted text will not work sometimes). This means it’s a group match and can be extracted.

Pls note that there is an CAPTURE checkbox in the Regex Editor, check the box it worked for my case.

Mark as Solution if I provided the answer😁

Thanks!

Topic		Replies	Views
Regex Extractor not extracting proper values AI Center question , ai_center	3	621	December 21, 2022
Matches Activity Works but Regex Based Extractor with Same Expression Not Working Document Understanding question , document_understanding , regex-extractor	5	544	August 29, 2023
Regex Based Extractor Not Extracting Data But Regex Builder Says It'll Work Document Understanding studio , regex , question	3	963	July 18, 2020
Possible Bug with Regex Based Extractor Document Understanding bug	2	928	July 22, 2020
Regex Based extractor not returns Studio	6	897	October 10, 2020

Intelligent OCR Regex Based Extractor Not Returning Values

Related topics