Document Classification

Hi,

I’ve been trying to classify two types of documents using, Taxonomy Manager, Digitize Document, Tesseract OCR, Classify Document Scope, Keyword Based Classifier, Validation Station, Train Classifier Scope and Keyword Based Classifier Trainer.
I can not use ABBYY as I do not have access to it.

I start with creating the taxonomy in the taxonomy manager like this:


The Company one is the same as the Firm, has a Title and a Name property.

Then I created an empty json file with just [ ] inside it.
I loop through my list of documents and digitize the document.

Then I add an Classify document scope and inside it I add Keyword Based Classifier.
In there I add the path to the JSON file I created earlier and configure the classifiers as such

Then I Present the validation station.

Then I add an Train Classifiers scope and inside it I add Keyword Based Classifier Trainer.
In there I add the path to the JSON file I created earlier and configure the classifiers as such

I run the robot and validate each file

But when I finish the JSON file í created is still empty.

Can anyone explain what I’m doing wrong or if what I’m trying to do is just not the way to use this tool.
All help is appreciated.

Hi, i am having the same issue, did you find any solution?

No not yet. Will post it if I find it. Not actively looking for a solution atm.

Hello,

I have the same question. Does this issue have a resolution as of July 2020?

Presuming it isn’t, I read the documentation for the Train Classifiers Scope at this location,

It does not mention anything about one additional input named Human Validated Classification Data that I’ve highlighted in my set up.

The question is, if the Robot is given Human Valiated Data, does it also need to be given some kind of classification information so that it can assign the updates from the Human Validated Data to the right classification groups?

image