I need help as soon as possible because this process just entered UAT this week and next week we are going to Prod environment.
Basically, I only have 1 valid form that needs to send to Action Center (just this page of my PDF file, everything else that is not this form, is junk). So, I have my taxonomy, my classifiers and I am using Forms AI extractor.
So, the problem comes when the classifier detects some of that “junk” as a valid form, so this mean that it is sending these “junk” pages into Action Center too.
I saw that those “junk” pages contain some words that my valid form also has (as far as I know, Intelligent Keword Classifier works with words in common in all of the files), so I am guessing this is why the bot is taking into account these “junk” pages. Right now, I am basing the desicion on uploading the files into AC by checking the Confidence level of each page, but there are valid forms with 68% and “junk” pages with 95% of Confidence, so I thinks this approach is not goint to work.
What can I do to prevent this behavior? Is this a good scenario for Intelligent Classifier or should I change it to the other Classifier? aptp_ver160310.pdf (58.6 KB)
Hi @GFlores
are there no other keywords that let you differentiate between junk and valid pages?
the only other classifier is machine learning classifier, you need to create a classifier model in AI center to use it. It supports Invoices, Receipts, Utility Bills, Purchase Orders (for these document types you dont have to train the model)
if you have other document types then you need to train the model in AI center yourself
I’ve already tried the ML/AI (before knowing of Forms AI) approach for the extractor in AI Center, but honestly I couldn’t implement it, it was too complex to understando, plus we are already in UATs and next week we go to Prod environment, so that’s why I am a little desperate.
I am attaching an empty example of the form file (APTP) that we need to detect, and the problem is that almost all the words there are present in the “junk” items, the only word I know is exclusive to this page is “APTP” (on the bottom of the form), but the problem is that a lot of the times this word is so small and cannot be read correctly. That’s why I implemented a logic with the Confidence level instead of specific words.