General Questions regarding Classify Document Scope

Background : We are in process of building a POC for a healthcare company that wants a robot to basically help them determine additional diagnoses based on their current diagnosis. Most of this is done post mortem for insurance purposes. So we decided to use the classify documents as our way to determine these diagnoses.

Since I am very new to Document Understanding, I was hoping to have some questions answered regarding Classify Document Scope.
–What is the best way to user Classifier Manage Learning? Single Line? Multiple lines?
–What is the difference between using one line, or multiple lines?
– What is the difference in using multiple words on one line, vs separate lines for each?

TIA!

@Ioana_Gligan

I assume you are using the Keyword Based Classifier? In this case, for each document type you can define one or more keyword sets (I assume this is what you mean by lines). A document is classified as a certain type if ANY of the keyword sets for that type matches the document and then, of all matching types, the one with highest confidence is selected. Within a keyword set for a document type ALL keywords must be found within a document in order for the document type to be a valid candidate.

Example: My document is “The quick brown fox jumps over the lazy dog” and my possible document types are “document about fox” and “document about bear”.

For document type “document about fox” I define the following keyword sets:

  1. “fox”
  2. “brown”, “fox”
  3. “red”, “fox”

For document type “document about bear” I define the following keyword sets:

  1. “agile”, “bear”
  2. “bear”

When I classify the document, keyword sets 1 and 2 for “document about fox” will match the document, but 3 won’t. Neither of the document sets for “document about bear” match the document so the type reported is “document about fox”.

PS: The above is specific to Keyword Based Classifier. Future classifiers (of which you can now test a preview of Intelligent Keyword Classifier), will have completely different setups.

1 Like

Yes, I was using Keyword Classifier. My apologies.