Document Understanding: The best document type exclusion model

azeem_rosli · January 16, 2023, 1:17pm

Hi,

Current situation:
I’m using Intelligent Keyword Classifier to classify main documents (mandatory: invoice OR credit note), supporting documents (mandatory: if invoice is present) & other documents.

Requirement:
I would like to skip data extraction for other documents but throw error for a new document (so that I can retrain).

Suggested solutions:
The document variation is so wide as there are so many vendors. Thus, we’ve thought a few solutions:

Create a model for ‘others’ taxonomy, then classify all of the other documents. Throw error for a document which is below confidence threshold.
Create a few models for ‘others’ in taxonomy, then categorise each identified other documents & classify them. Throw error for a document which is below confidence threshold.

Question:
What would be the possible drawback in those solutions? Or what would be the best solution for the requirement?

system · January 18, 2023, 4:01pm

Hello @azeem_rosli!

It seems that you have trouble getting an answer to your question in the first 24 hours.
Let us give you a few hints and helpful links.

First, make sure you browsed through our Forum FAQ Beginner’s Guide. It will teach you what should be included in your topic.

You can check out some of our resources directly, see below:

Always search first. It is the best way to quickly find your answer. Check out the icon for that.
Clicking the options button will let you set more specific topic search filters, i.e. only the ones with a solution.
Topic that contains most common solutions with example project files can be found here.
Read our official documentation where you can find a lot of information and instructions about each of our products:
Watch the videos on our official YouTube channel for more visual tutorials.
Meet us and our users on our Community Slack and ask your question there.

Hopefully this will let you easily find the solution/information you need. Once you have it, we would be happy if you could share your findings here and mark it as a solution. This will help other users find it in the future.

Thank you for helping us build our UiPath Community!

Cheers from your friendly
Forum_Staff

Lahiru.Fernando · January 23, 2023, 6:06pm

Hello @azeem_rosli

I have come across the same scenario in one of our projects.
Similar to yours, I also had a wide range of documents that fall under “Other”.

What we did is the first option you have.
We created an entry in Taxonomy for “Other Documents.” We used the Intelligent Keyword Cassifier, and Keyword Classifier for our scenario. This may change according to the complexity of the documents. However, we trained the models with a large number of documents which they provided. We used Keyword Classifier to specify some specific keywords we identified in most of the documents we see under “Other”.

This way, we were able to classify all those under one category. However, in your case, in case you want to have separate "Other’ for different documents, you will need to train them all. But if that is not needed, you can always classify under one type. The only drawback is, you will not be able to find different types of other documents because it is just one “Other” section.

Hope this helps.

azeem_rosli · January 23, 2023, 10:16pm

Hi @Lahiru.Fernando

Thanks for the response. I’ve been thinking, in your case is there a possibility that the Intelligent Classifier would misclassify the main document as ‘Others’ document?

Lahiru.Fernando · January 24, 2023, 10:51am

Hi… @azeem_rosli

We had that situation during the initial phases. However, we did a lot of training on thousands of documents, and now it works just fine… We also recently stopped the auto training because it no longer needs training.

azeem_rosli · January 24, 2023, 12:10pm

Alright. Great feedback. I’ll try to train them using a lot of documents. Thanks.

Topic		Replies	Views
Intelligent Keyword classifier- Document Understanding Studio orchestrator , robot , studio , question , tools	1	1074	December 31, 2021
Question in Document understanding invoice processing Activities pdf , orchestrator , activities , studio , question , system	14	420	April 26, 2023
Machine Learning Classifier Document Understanding studio , bug , activities_panel	5	1720	September 18, 2021
Document Understanding: Document Splitting and Other Wonderful Stories :) Document Understanding	65	10393	January 15, 2022
UiPath Document Understanding Machine Learning Classifier Public Endpoint release Product News feedback , document_understanding	7	2571	July 8, 2022

Most Active Users - Yesterday
ashokkarale
Anil_G
Yoichi
yangyq10
postwick
chandreshsinh.jadeja
aravindbalineni123
Parvathy
aya
PRASHANT_GABHANE
More details...

Document Understanding: The best document type exclusion model

Related Topics