I was looking through the documentation of keyword based classifier and its trainer:
- Keyword Based Classifier
- and so on…
But I did not find what algorithms are used to classify and to train. It is just an overview explanation.
I would like to understand the algorithms in order to be able to foresee the risks of error and stability.
Multiple questions rise like:
- how is each keyword weighted?
- how do these weights change after 1 training?
- how many documents do need to cycle through to have sufficient training?
- if we make a general category of document (like invoice) the titles / selected keywords can change a lot (depending on the invoice provider), how does the higher volume (hence more frequent training) of 1 format affect the categorization confidence of the other formats?
it would be great to have more details on how this keyword classifier and trainer work. more questions will probably arise. This topic is to start to deepen the understanding.