We have used Intelligent Form extractor, it serves the purpose very well but found it a bit slow even after removing the validation station. I am guessing ML extractor would also be slow to compare the data with the endpoints… which would be the fastest extractor ?
Also, is there a precedence to the order of extractors used or is it just left to right ?
No, ml extractor are much faster than intelligent form extractor. If you have want to extract data from different types of pdf template then you can go with ML extractor
I think your pdf template remains same
intelligent form extractor are used to extract hand written docuents,if you have computerized pdf document and you want to extract it means then you simply go head with form extractor
Even we can use all the extractors at tye same by configuring extractor and confidence percentage
Based on the scenario we need to choose extractors
Hope the above statement are help you to understand about extractors
Thanks for your response. We have a combination of Intelligent Form, ML and Regex. (as the documents can be pretty unstructured and we do not have data manager license to improve the ML extractor’s learning )Noted that ML extractor is faster. The only way then is to set threshold and find the right combination of extractors.
We have used Intelligent form extractor and ML, as the ML did not cover all the mapping fields which we need. since we have different format invoices , we had to go with templates . With loads of documents coming in ( we tried max 16 docs) the extraction took a lot of time.
You can train a custom ML extractor to add the fields you need. For invoices etc. I would recommend using it if you have >20 or so different variants
If you want to speed it up you can, in theory, do multiple requests at once, but usually DU is not used for online processing. What’s the use case for which you need this?
Invoice Processing. We are currently doing a PoC on community edition. A combination of Intelligent Form Extractor and ML Extractor is giving good results. Its the time that we are concerned about and looking for options to reduce on the time. If there is a way to optimize code and reduce extraction time, please let us know.
Thanks
Extraction time is dominated by Digitization and the Extraction. Neither of these is something you have control over as a developer. What you can do is run the process unattended and use Action Center for validation so you don’t care about waiting, or you can try to use a Parallel For Each to run multiple documents in parallel (although I wouldn’t do more than a few at a time).