I am using Keyword based classifier for my documents (let’s say they are Bank confirmations about payments).
Now I have a lot of similar keywords that are being mentioned in the documents.
Based on that, I have defined set of keywords that we need to use.
Using these few documents that I have, I have noticed that the confidence level is around 50-60%.
First question would be, if I add more keywords does this means that the confidence level is going to raise.
What should I do if the documents are below my confidence threshold?
It is possible that adding more keywords to your classifier could increase the confidence level, as it may help the classifier to better differentiate between different types of documents. However, this will depend on the specific characteristics of your documents and the way you have implemented your classifier.
If your classifier is not achieving the desired confidence level, there are a few things you can try:
Collect more training data: One potential issue could be that you do not have enough training data to accurately classify your documents. In this case, you may want to try collecting more examples of each type of document and using them to train your classifier.
Fine-tune your classifier: Another option is to fine-tune the parameters of your classifier to better fit your specific dataset. For example, you may want to try different values for hyperparameters such as the learning rate or the regularization strength.
Use different features: You may also want to try using different features to represent your documents. For example, instead of just using keywords, you could try using a combination of keywords and n-grams, or you could try using word embeddings such as word2vec or GloVe.
Use a different classifier: If the above approaches do not work, you may want to consider using a different classifier altogether. There are many different types of classifiers to choose from, and some may be more suited to your specific dataset than others.
Ultimately, the best approach will depend on the specifics of your dataset and the goals of your classification task. It may be helpful to try a few different approaches and see which one works best for your needs.
I have defined Keyword extractor and it is working fine now.
Now I want to use Intelligent Keyword Extractor.
I have used few documents for each of the document type to train the model.
Now that I run the workflow I get exception:
Classify Document Scope: Index was outside the bounds of the array.
I don’t understand this completely because there is not array in the Classify Document Scope at all, and this exception started happening since I started using Intelligent Keyword Extractor.