Which one is the best Extractor (RegEx Extractor or Form Based Extractor)

Hi,

I have a requirement where i need to scan the 1000 Vendor invoices on daily basis and process the extracted value for further process.

The problem is , currently we have implemented the solution using Regex Extractor, but if there is any new vendor added or template changes then we may need to make or add the new Regex patterns to the newly added vendor . we have approximately 1000 vendors at this moment and chances are adding more vendors in future.
and also we re facing the accuracy level also not good when we are using RegEx , as i can say 50% is the accuracy.

I would like to check can i use Form based extractor instead of Regex ? or which is the good Extractor among this two? i need to give the proposal in my org, if you can help that would be really appreciated .Thanks

1 Like

Hi @Rajesh_Babu1

If there are more than 1000 different Vendor templates you can always go with Document Understanding and AI Center to create the ML model which will give more accuracy in extracting the Data.

The other extractor the Regex, Form and Intelligent Form will work but with less accuracy level and the extracted data format may mismatch.

For 1000 + different format you can use AI Center to add documents in future and retrain the model, also after 2-3 trainings the model can be auto retrained.

Thanks

1 Like

Hi @suraj.setty

Thank you for your feedback, instead of ML Extractor can i go with Form Based Extractor as we hve some limitation to use ML Extractor in our organization. and also , in terms of Form Based and Regex which one is the best, most of them is suggesting Regx will give more accuracy but i may need to manage my customer with strong point as we are going to changing the entire design. any suggestions pls.

if you have structured then you can use form extractor but where ever you have unstructured and different type of pdf best way is using regex based extractor

Yes , the Regex is better compared to Form based as the Form Based can be used on Structured Data and constant fields. In Regex you can specify all the fields but when you add new Document to the existing document you have to add new Regex Patterns each time.

If the template for all the vendors are same and the fields are static you can try Form Extractor.

Thanks.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.