Which extractor is fastest in document understanding

preeti.thukaram1 · January 19, 2022, 9:38pm

Dear All,

We have used Intelligent Form extractor, it serves the purpose very well but found it a bit slow even after removing the validation station. I am guessing ML extractor would also be slow to compare the data with the endpoints… which would be the fastest extractor ?

Also, is there a precedence to the order of extractors used or is it just left to right ?

Best Regards,
Preeti

Robinnavinraj_S · January 20, 2022, 1:48am

Hi @preeti.thukaram1

No, ml extractor are much faster than intelligent form extractor. If you have want to extract data from different types of pdf template then you can go with ML extractor

I think your pdf template remains same

intelligent form extractor are used to extract hand written docuents,if you have computerized pdf document and you want to extract it means then you simply go head with form extractor

Even we can use all the extractors at tye same by configuring extractor and confidence percentage

Based on the scenario we need to choose extractors

Hope the above statement are help you to understand about extractors

Thanks
Robin

Sudharsan_Ka · January 20, 2022, 2:37am

Hi @preeti.thukaram1

You can also try with regex extractor

Regards
Sudharsan

preeti.thukaram1 · January 20, 2022, 9:15am

Thanks for your response. We have a combination of Intelligent Form, ML and Regex. (as the documents can be pretty unstructured and we do not have data manager license to improve the ML extractor’s learning )Noted that ML extractor is faster. The only way then is to set threshold and find the right combination of extractors.

Rajeena_M · January 20, 2022, 10:05am

We have used Intelligent form extractor and ML, as the ML did not cover all the mapping fields which we need. since we have different format invoices , we had to go with templates . With loads of documents coming in ( we tried max 16 docs) the extraction took a lot of time.

alpaca · January 20, 2022, 12:12pm

You can train a custom ML extractor to add the fields you need. For invoices etc. I would recommend using it if you have >20 or so different variants

If you want to speed it up you can, in theory, do multiple requests at once, but usually DU is not used for online processing. What’s the use case for which you need this?

preeti.thukaram1 · January 20, 2022, 2:20pm

Invoice Processing. We are currently doing a PoC on community edition. A combination of Intelligent Form Extractor and ML Extractor is giving good results. Its the time that we are concerned about and looking for options to reduce on the time. If there is a way to optimize code and reduce extraction time, please let us know.
Thanks

alpaca · January 20, 2022, 4:32pm

Extraction time is dominated by Digitization and the Extraction. Neither of these is something you have control over as a developer. What you can do is run the process unattended and use Action Center for validation so you don’t care about waiting, or you can try to use a Parallel For Each to run multiple documents in parallel (although I wouldn’t do more than a few at a time).

Rajeena_M · January 21, 2022, 1:08am

Using parallel for each did reduce the time for extraction

system · January 26, 2022, 9:27am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ML Extractor versus Intelligent Form Extractor Document Understanding activities	2	1192	June 22, 2020
Which one is the best Extractor (RegEx Extractor or Form Based Extractor) AI Center question , ai_center	5	800	June 9, 2022
Accuracy of DU Extractors AI Computer Vision activities , feedback , document_understanding	3	1036	June 22, 2022
Document understanding : Extractor choice Something Else feedback	3	582	September 10, 2022
Using Form Extractor but shows not extracted in Present Validation Station Document Understanding form-extractor , invoices	5	490	July 7, 2023

Most Active Users - Yesterday
Anil_G
ashokkarale
Ajay_Mishra
Gautham_Pattabiraman
BHUSHAN_NAGAONKAR1
vrdabberu
ABHIMANYU_THITE1
lrtetala
samantha_shah
shyamala_shyamu
More details...

Which extractor is fastest in document understanding

Related Topics