Intelligent OCR Regex Based Extractor Not Returning Values

I followed the Document Understanding directions (set up taxonomy, load taxonomy, digitize document, classify document, data extraction, and present validation), but I cannot get the Regex Based Extractor to match any values. I’ve configured the expressions and tested that they work, and I’ve configured the extractors to check the boxes for the fields that should be extracted from the Regex Extractor. But when the validation station pops up, it says all values were not extracted. Exporting the extraction results shows the same–nothing pulled out. I’ve done this with position-based and ML extractors and haven’t had issues. What could I be missing?

Hi welcome to the community!
Is there anything more detailed that you can share with us?

Here’s several screenshots of the process.

I have the ML extractor and Regex extractor inside the Data Extraction Scope:

I input all of the variables to this scope activity from the previous steps according to the Doc Understanding instructional video:
image

I set the Configure Extractors to take NAIC ID from the Regex Extractor:

Here’s the Regex expression with the actual text that’s being read by the OCR as the sample, and there is in fact a match:


Why is there still no value extracted from the Regex portion?
image

any updates on this ?

I have no solution yet. Are you having the same issue?

Yes, I am noticing this issue. If there are multiple extractors, I see only the first one is working.

Could you get regex to work being the only extractor? That was failing for me as well.

I’m thinking of trying a workaround of doing the regex Matches activity on the DocText string, and then feeding the result into the ExtractionResults variable. Not sure if that’s easily modifiable, though.

Let me know if it works.
I will try extractor at a time and publish my results here as and when I complete

Hi, I am also facing same situation. followed the same steps but still during present validation station I have to manually select items always

I don’t think my idea is feasible to edit the ExtractionResults variable and add the Matches text.

I’d like to hear from a UiPath person who can speak to the issue with the Regex Based Extractor.

@alexcabuz It will be much appreciated, if you can take a look at this.I am also facing the exact same issue when using multiple extractors. My regex when tested separately it’s a match, but when during present validation station no data is getting extracted.

@btc653 I’m having this same issue and couldn’t edit the extraction results either. I got around it by exporting the extraction results into a dataset and then going through the tables and filling in any missing values using the matches activity. However this makes the Regex extractor useless so it’d be good a hear back from someone on support on whether it’s user error or just some delayed feature fixes.

For the record I’d also like to add that I can’t get the regex extractors to work by themselves or with other extractors, but the regex wizard does confirm a match (I’m at the point where I just match a word literal to try and get something in the results).

1 Like

Having the same issue, I am using only 1 extractor. My workflow is

  1. load taxonomy
  2. digitize document (using Tesseract OCR)
  3. classify document scope, with keyword based classifier
  4. data extraction scope, with regex based extractor

Classification is working (document is correctly classified as invoice as defined by my taxonomy), but the regex based extractor is not returning anything. The regular expressions themselves work on test data.

UPDATE: solved, I didn’t know you have to indicate “capture” for every regular expression you make!

Hey Niels,

Just wanted to update you on something, I figured out that my extraction wasn’t working because I wasn’t surrounding my regex with a capture group. The regex extractor returns every captured group as an option in the extracted results, so keep that in mind.

Turns out in the regex extractor instructions it does specify that you need to surround them in a capture group but I must have glossed over it.

3 Likes

Hi marios, please, would you like to explain it more in detail?
I would like the extracted data to be prepared for human validation more detailed than
with predefined ML extractors. Eg. address to be further split into street, street nr., …
I already have these fields predefined in taxonomy manager.
Regards,
Vanja

Hello all,

Sorry for the late reply!

Please have a look over this: How to use the IntelligentOCR Package - it might help in seeing how the regex based extractor needs to be configured for both a simple field as well as a table field :slight_smile:

Have fun,

Ioana

Hi, I got the solution from other source:

extractionResults.ResultsDocument.Fields.First(Function(i As ResultsDataPoint) i.FieldName = “Street”).Values(0).Value = vcStreet

Regards,
Vanja

Hello, did you resolve issue?

I had the same issue - I’ve used two diffrente extractors:
Intelligent Form Extractor
Regex Based Extractor
and Regex Extractor didn’t give me any results.
The solution is to round your RegEx with the parentheses. After that, all is working

Hi guys, Solution actually is that Regex Editor works completely different from how we been matching with VB methods.

You have to match code until your sample text is highlighted with yellow background (gray highlighted text will not work sometimes). This means it’s a group match and can be extracted.

Pls note that there is an CAPTURE checkbox in the Regex Editor, check the box it worked for my case.

Mark as Solution if I provided the answer😁

Thanks!