Which extractor is fastest in document understanding

Dear All,

We have used Intelligent Form extractor, it serves the purpose very well but found it a bit slow even after removing the validation station. I am guessing ML extractor would also be slow to compare the data with the endpoints… which would be the fastest extractor ?

Also, is there a precedence to the order of extractors used or is it just left to right ?

Best Regards,

Hi @preeti.thukaram1

No, ml extractor are much faster than intelligent form extractor. If you have want to extract data from different types of pdf template then you can go with ML extractor

I think your pdf template remains same

intelligent form extractor are used to extract hand written docuents,if you have computerized pdf document and you want to extract it means then you simply go head with form extractor

Even we can use all the extractors at tye same by configuring extractor and confidence percentage

Based on the scenario we need to choose extractors

Hope the above statement are help you to understand about extractors


Hi @preeti.thukaram1

You can also try with regex extractor


Thanks for your response. We have a combination of Intelligent Form, ML and Regex. (as the documents can be pretty unstructured and we do not have data manager license to improve the ML extractor’s learning )Noted that ML extractor is faster. The only way then is to set threshold and find the right combination of extractors.

1 Like

We have used Intelligent form extractor and ML, as the ML did not cover all the mapping fields which we need. since we have different format invoices , we had to go with templates . With loads of documents coming in ( we tried max 16 docs) the extraction took a lot of time.

You can train a custom ML extractor to add the fields you need. For invoices etc. I would recommend using it if you have >20 or so different variants

If you want to speed it up you can, in theory, do multiple requests at once, but usually DU is not used for online processing. What’s the use case for which you need this?

Invoice Processing. We are currently doing a PoC on community edition. A combination of Intelligent Form Extractor and ML Extractor is giving good results. Its the time that we are concerned about and looking for options to reduce on the time. If there is a way to optimize code and reduce extraction time, please let us know.

Extraction time is dominated by Digitization and the Extraction. Neither of these is something you have control over as a developer. What you can do is run the process unattended and use Action Center for validation so you don’t care about waiting, or you can try to use a Parallel For Each to run multiple documents in parallel (although I wouldn’t do more than a few at a time).

Using parallel for each did reduce the time for extraction :slight_smile:

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.