Train Extractors and Classifiers for Machine Learning Extraction on Invoices


I’m trying to get values from invoices. They have different languages and formats. I tried Machine Learning Extraction to get values but couldn’t take all which i want. After research i found Train Extractors and Classifiers scopes. I checked old topics people say you cannot do that right now. As i said they are old. Is there any news about it? Can i take all values which i want from invoices?

Any tips welcomed here.

@loginerror What do you think about it?

Hi @ercanebiler

I think the currently available model contains a fixed amount of fields that it can find.
It is supposed to be improved in the future versions though.

Maybe @Ioana_Gligan will be able to help here a bit :slight_smile:

@loginerror I talked with her actually. She said we can’t get table values with regex extractor. Just taking the other values. However i couldn’t take any other value with regex etractor.

Indeed, currently the machine learning models are pretrained.

If you have any kind of ml model you built yourself and would like to train, the scopes you found are perfect for it. But you have to provide your own custom training activity for the model you have built yourself, for now.

Hope this helps,


What do you think about ABBYY Flexicapture? Is it more successful than the others?

Hello @ercanebiler,

the fit of one extraction method or another can only be discussed on a case-by-case basis. Unfortunately I cannot tell if an extractor is better than another, it all depends on what you need to extract, how the documents look like, what quality they have, what variability is in there, etc…