Regex Based Extractor Not Working

Hi all,

Firstly, I am new to the Uipath, so please pardon if I am not using the right terminology.

We have just deployed a project to parse PDF invoices via data extractor, which includes position based extractor and regex based extractor. It worked well till few of the invoice layout have been altered, hence we can to add more reg ex patterns. For some reason, the reg ex pattern using the “UiPath.IntelligentOCR.Activities.DataExtractation.RegexBasedExtractor” is not working as expected.

For an Example, a pattern in regex editor below works fine in the test text but not with the actual PDF

A snapshot of PDF is as per below.

Any help is much appreciated.

Thanks,
Chintan

Hi @chintan.patel
I think problem related spacing may be there

Check that too

1 Like

Hello @NIVED_NAMBIAR, yes I have already checked the spacing but not related with the spaces. I guess it may be related to text version of PDF which does not align with what has been displayed on the PDF.

i.e. text version of PDF

what @NIVED_NAMBIAR said its correct only may be space problem use this look behind regex (?<= Due $)\d+

HI @copy_writes - If I understood you correctly, I have changed my regex pattern to be ((?<= Due\s*$)((\d+.\d{2}))) but still didn’t work.

For some reason, its scanning all the values in PDF with digits. i.e. validation station display as per below after the data extraction activity.

@chintan.patel …yes…you have to create a pattern for what is available in the text format or else Regex wont work …

But may I ask why are you choosing Regex for Amout Due? This can be easily captured with Form extractor or Intelligent form extractor.

Hello @prasath17
What is the intelligent form extractor? I had to choose regex because the text “Amount Due” is not fixed on the form, it would change depending upon number of lines.

@chintan.patel - If the Amount Due position is not fixed, then Regex based extractor and Intelligent form extractor won’t work. In that case, you have to go with ML Extractor.

So there are no other alternatives?

@chintan.patel …ML Extractor…if its a invoice you can add the Invoice endpoint which will extract the amount due

I had ML but its not cost effective. Also, it doesn’t scan everything I need, so I will have to invest in my own ML end point. Anyway, thanks for your help.