The IntelligentOCR package allows users to perform document processing in their workflows, with some out-of-the-box functionality available for usage, as well as the framework required for building your own document classification and data extraction components.
To run the workflows, you must add your own ApiKey for Document Understanding from https://platform.uipath.com.
EDIT1: Action Center Integration Sample
Here is a sample workflow that uses an end to end document understanding processing workflow, and uses Action Center Integration for Human Validation: SampleDUActionCenterIntegration-New.zip (620.8 KB)
ORIGINAL POSTING: Sample Document Understanding Basic Usage
Here is a sample workflow that performs:
digitization, using the OmniPage OCR engine available in UiPath,
document classification, using the Keyword Based Classifier,
data extraction, using both the Regex Based Extractor as well as the Machine Learning Extractor available for processing Invoices and Receipts
data validation, using the Present Validation Station attended activity, and
classifier training, for the Keyword Based Classifier.
Please note that the Taxonomy (list of document types and associated fields) is editable using the Taxonomy Manager wizard (wizard ribbon after the IntelligentOCR package is installed). DocumentProcessing_IntelligentOCR300.zip (956.5 KB)
Can we process pdf with more than 2 pages? because i found that some documents contains more than 2 pages. can we process all the pages ? using this extractor
Can we make the model learning using the classifiers and learning activities mentioned in your workflow?
Like if trains one time and next time for the same file it should be giving actual outputs(Trained + Original) @Ioana_Gligan
Yes, model training / retraining can be enabled using the Train Classifiers Scope and Train Extractors Scope. Make sure to check out the documentation on how to write your own training activity for an extractor or classifier.
For training extractors, do we need to write our own extractor activities? Why I asked is because I don’t see a trainable extractor activity.
Machine learning extractor is already trained for a set of fields. Regex we got to code. What are the activities that we can train under train extractor scope?
I m trying to use execute the workflow which u have attached.
But it is showing me unresolved Activity after installing Intelligent OCR activities.
i.e after Load Taxonomy…
kindly suggest which other package needs to be installed along with Intelligent OCR activity.
Could you provide a screenshot?
Normally you can right click on the broken dependency and select “Repair” for it to be corrected.
You should also remember to open the project from the project.json file, which will read all the dependencies required to run it and download them automatically.
In order to train extractors, you currently have to build your own
The machine learning extractor is pre-trained and does not expose the re-training capability at this moment.
If you have an in-house algorithm capable of learning, it is very easy to enable the feedback loop, but you do have to write your own training activity.
We will keep you posted when any out of the box options appear.
Hello, @loginerror
Thanks for help!
Can i ask, Taxonomy editor and Keyword Based Classifier support cyrillic?
Let me make it simple: Intelligent OCR activities support cyrillic?
Thanks.
IntelligentOCR is language agnostic. You can define documen ttypes in cyrillic using the txonomy manager, they should be properly displayed in all wizards and in the validation station, keyword based classifier is language and alphabet agnostic… as long as it’s representable in UTF-8.
I have to warn you though that DIgitize Document is optimized for left to right top to bottom writing, and works best for latin languages… We will be optimizing this for other languages / alphabets as well.
I tried using the IntelligentOCR Package… Trained 5 invoices… is there a way to remove Present validation Step after training for a few times? Can I re-use the learning file without using Validation step for new invoices with same format?
Hello!
How to create file .json for activity Keyword Based Classifier? Is it created from UiPath itenface, like a taxonomy file from taxonomy editor?
Thanks.
I’m sorry, i found solution for my question:
" The activity does not automatically create a file at the specified location. A best practice is to create an empty.JSON file at that location."