Keywords and Classify Document Scope

My problem is this: i’ve build a document understanding workflow, but it isn’t finding the right keywords when classifying documents.

For example i have a document type which I’ve given the following keywords to find in the classify document scope:
“invoice”, “air”
The following words are found in the document that is being processed:
aircraft, invoice
What happens is that the classify scope finds the word invoice but not the word air, because it’s only looking for words that equal air, not just contain the word air.

Is there a way around this? Make the classifier able to find the keywords if if it’s only part of a word in the document?

This post might be helpfull- Intelligent OCR Regex Based Extractor Not Returning Values - #3 by btc653

I’m not 100% on how you would implement it (yet) but I imagine you would use some sort of wildcard/Regex to get aircraft

@wdag This part of the DU is the most complex, Differentiating documents using classifiers requires some time exploring all the possible words that is unique to the same.

  1. Try to get the text out of the document from Digitize Document and search if the keywords are present.

  2. Add keywords until the execution trail finds the document in Classify document scope

Note: I always used Whole words for keyword classification

Yeah I thought that would be how it worked. I was just hoping different.
Especially since sometimes the OCR recognizes one words as three different words randomly if the scan is even a little bad.

The problem is that I don’t know all the possible whole words that could be found, but I do know that it will always contain the partial word I wanted the classify scope to find.
Whole words will mean may more documents will have to be run through the workflow before it works. Ah well, so be it.

Unfortunately, partial token matches are not supported at the moment, so “aircraft” will not match “air”.

1 Like

@tudor.serban, These activities belong to IntelligentOCR package. Is this package without Abbyy
Production ready?

Yes it is.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.