Firstly, I am new to the Uipath, so please pardon if I am not using the right terminology.
We have just deployed a project to parse PDF invoices via data extractor, which includes position based extractor and regex based extractor. It worked well till few of the invoice layout have been altered, hence we can to add more reg ex patterns. For some reason, the reg ex pattern using the “UiPath.IntelligentOCR.Activities.DataExtractation.RegexBasedExtractor” is not working as expected.
For an Example, a pattern in regex editor below works fine in the test text but not with the actual PDF
Hello @NIVED_NAMBIAR, yes I have already checked the spacing but not related with the spaces. I guess it may be related to text version of PDF which does not align with what has been displayed on the PDF.
Hello @prasath17
What is the intelligent form extractor? I had to choose regex because the text “Amount Due” is not fixed on the form, it would change depending upon number of lines.
@chintan.patel - If the Amount Due position is not fixed, then Regex based extractor and Intelligent form extractor won’t work. In that case, you have to go with ML Extractor.
I had ML but its not cost effective. Also, it doesn’t scan everything I need, so I will have to invest in my own ML end point. Anyway, thanks for your help.