"Please select evidence for the field" message in Document Understanding Framework

Hello,

I’ve been trying to build a Process using the Document Understanding Framework. I’ve repeated and recreated my templates several times and yet, when I run the process, I get this error repeatedly.

I have swapped the Microsoft and Tesseract OCRs. Also, the Form Extractor uses this API endpoint for which I use the Key from the Orchestrator Services tab.

image

I’ve also confirmed that the file I’m trying to extract information from is less then 50KB which is way below the limit of this end point.

Where am I going wrong?

thanks!

@AndyMenon Check if the Digitize Document Activity is able to get the text results properly, Also Check Output of Classifiy Document Scope Activity, you can check if the Count of the Classifier Result is greater than 0 or not and then continue using Validation Station and the Exporting of the document .

You can try using different OCR Engine and check if it is able to get the data, also if you’re using OMNI Page OCR, check by using different Properties of Profile. You can keep the Profile in Scan Mode, if it doesn’t work change it again and check if it is able to get the data.

@supermanPunch

I checked the document Text output. It contains JSON content consisting of box positions and the words. The snapshot of the output document text is below. Since it contains position data I thought I could use the Position-based Extractor. But the Position-based extractor is not available in my environment despite installing the required packages and following the instructions on this forum thread (where I have posted a third-problem):

I have built two projects from scratch using two different PDF documents thinking that the documents might be the problem. But in both cases the project ends up with the error message.

Document Text output confirms words extracted with position information:

Required Packages Installed after selecting Pre-Release Option:

Position-based Extractor not available in Studio!

Thanks for your help!

Andy

@AndyMenon May I know the Output of which Activity did you use to get that Json Data ? Because the data should actually be a raw data from the PDF file you are using as far as I know. :sweat_smile:

Yep. I sent you a snapshot of the DOM that contains the word+position info. But the DocumentText also contains the raw data as seen from my debugger here below.

@AndyMenon Ok. The data looks to be extracted. What about the Classification Results?

Excellent timing! :smiley:
I was debugging that at this very moment! It is blank!

image

A related problem - I see only one classifier in my environment. But the instructions forum shows a screenshot where there are multiple classifiers. Comparison below:

My Studio:

UiPath DUF Forum Page: New UiPath Document Understanding features have been released!

Can you send me your project by removing all sensitive data? I want to run a comparison of the project dependencies to see where this is running off the road for me.

Thanks

I was able to get it to work, but I do not know if this is the default way to go. But here are the results:

Output Excel Export :

Swapped out the Microsoft Engine for the Tesseract which has a default Scale of 2.

Replaced the Keyword based Classifier with the Intelligent Keyword Classifier and trained it by clicking on the Manage Learning link and providing the trainer with an input PDF to generate the json file.

Once I did that it pretty much worked.

Now the problem remains why the Keyboard Based Classifier refuses to work! :thinking:

Thoughts?

@AndyMenon How many Keywords have you provided in the Keyword Based Classifier? How many pages does your PDF contain ? The Intelligent Keyword Classifier is available starting from the latest release. I was able to get the Document Understanding Framework to work by using Omni Page OCR with a Scan Profile and use Keywords from the Document that appear to be constant in those types of document throughout.

I’m not sure if the Intelligent Based Classifier is really needed for your data extraction :sweat_smile:

I have one page.
There are about 15 artifacts on the page and I’m extracting about 10 of them.

I was able to configure the KW based Classifier by using the “Manage Learning” link of this activity.

I posted the steps in another thread a few hours ago (link below). Is this the right way to use this Classifier?

I’m asking because I watched a couple of YouTube videos and they showed that a blank jSon file was to be input to this Classifier. They did not say anything about adding keywords manually. I came up with the steps after reading through the UiPath documentation for this Classifier.

@AndyMenon I think the methods that you used are proper. So if you remove the Intelligent Keyword Classifier now, It Still won’t be able to classify the document ?

1 Like

It works now. This is how my Classifier setup looks without the Intelligent KW Classifier. One thing I did is that I added more keywords than the number of artifacts I’m extracting. Therefore, I have to update my template to see how the newly added keywords will help in pulling more information out of the document.

Thanks for your help with this! :slight_smile:

1 Like

Hi @supermanPunch,

Is this how a Form Extractor appears on your end?
Or should I be using a different extractor to extract the data?

image

@AndyMenon Actually the appearance doesn’t matter Since you are using an Updated version, It appears in that form now, The Parameters are proper. I don’t think there has been any extra updates on the Form Extractor. You’ll have to use Manage Templates and create Templates to define the Fields that you want to Extract for each of the Document type.

1 Like

Awesome! I did run a test for the extra fields I needed. Sure enough, it works and as shown below, the 3 new details have been extracted as expected.

:+1:

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.