Thanks for your responses. Unfortunately I still face the same issue. I hereby attach the screenshots of my configuration. Please let me know whether I need to update something.
Initially I used the latest packages in my process. Since it was throwing the error, I downgraded packages to the lower versions. But still I’m facing the same issue
Always use the latest version.
May I know the answers for the below.
can you run the program in debug mode, and check what values you get for below variables
File
DocText (check whether it is null)
DocObjectModel (checi whether this is null)
Taxonomy (check for null)
AutoExtractionResult (Check for null)
Also, check whether the robot has access to the “DocTask” bucket in Default folder. What I mean here is, the robot should be in the Default folder.
Hi Lahiru,
Thanks for your response. The robot is in Default folder and the above variables are not null.
One question- I have used normal process for Document Understanding. Is there any rule stating that Create Validation station action should be in used in Orchestrator process. If so, will this be a reason for the issue?
There is no such need to create the DU flow in an Orchestrator process. You can develop it in a normal process flow. You only need to create a process only if you are using Action Center or persistence activities
Hi @Lahiru.Fernando, I had an general qs about Intelligent OCR Form extractor. Is it capable of extracting embedded text inside logo or a stamp from a PDF document?
Hi @Lahiru.Fernando, I tried it with a sample logo and Organization name embedded inside the logo. It was not able to extract the text.
Please see screenshot below. The text inside the logo is not highlighted when I draw a boundary around it. Although OCR can extract the text as a separate activity. But I was looking for,
if IntelligentOCR has that inbuilt.
Thanks for your response. I still face the same issue. Should I check any other configuration or is there any other way where I can resolve this issue.
Iam facing this issue for more than one week. Can you help me on this?
Can you send me your workflow solution so I can have a look? If it is possible, let me know what sort of documents you are processing too… if it is invoices, I have some samples with me. but if it is a different kind i need to know
Without training the model for 2nd run(to automatically detect/extract values), how would it be an advantage to add human validation stations as it requires humans everytime, and also we cant even use this in unattended bot.
Any leads on this please
The model is indeed trained when you have not only collected the data, but also curated it (making sure the humans did not make mistakes etc).
You can put in validation logic before the human validation step, to decide whether human review is needed or not.
You don’t always need an attended bot, as you can use the Validation Station integrates into Action Center in Orchestrator, in a long running workflow, so you can use pretty much any robot…
Can you elaborate this one if possible? How to retrain Du models without AI fabric as there is no extractor trainer activity available in studio like for classifiers
yes there is, but its purpose as of yet is to collect the data to be used for training.
it does not automatically trigger retraining with each new validated document.
this is why you would need to go to AI Fabric and start the training from there…
I’m working on Document Understanding and I have a following question:
We can use Machine learning extractor by using one of the provided endpoints here and API key
Both provides same fields to extract So what’s the difference between using the above two methods?
I’m assuming that when we use ML Extractor with the Endpoint and API key only we cannot retrain it the model.
But if we use ML extractor with ML Skill we can retrain it using ML Extractor Trainer. Please Correct me if I’m wrong I’m getting really confused in the use of these two methods.
Thanks.
Hello, could you please let me know how to use Document Understanding for rotated pages and pages which are not in the correct order? For exampe, I have a PDF file with 3 documents in it, first with 3 pages, second with 5 pages and third with 1 page, but the file starts with 2 pages of the first document, then one page from the second documents and then comes the third page from the first document, and so on. So they are not clarrified correctly and the Classification Station does not let me change the order of the pages. Also I have another exmple where the pages are rotated upside down and are not correctly digitized and none of the pages is classified.
Thank you!
From my experience, You cannot do anything if the documents are skewed or the skip angle of the characters are bad. The only way is to get good quality documents with a higher DPI.
For Classification and Extraction, you definitely need a Business Rule where it should state the initial and final set of pages
My recommendation would be splitting the documents using keyword matching and then processing through the DU Architecture
You are absolutely right: the public endpoints have a generic model behind them, that you as a customer cannot retrain or operate any change on. On the other hand, if you host your own model in AI Fabric, you can retrain it, add / remove fields, do transfer learning, etc.