How to use the IntelligentOCR Package

The IntelligentOCR package allows users to perform document processing in their workflows, with some out-of-the-box functionality available for usage, as well as the framework required for building your own document classification and data extraction components.

Here is a sample workflow that performs:

  • digitization, using the OmniPage OCR engine available in UiPath,
  • document classification, using the Keyword Based Classifier,
  • data extraction, using both the Regex Based Extractor as well as the Machine Learning Extractor available for processing Invoices and Receipts
  • data validation, using the Present Validation Station attended activity, and
  • classifier training, for the Keyword Based Classifier.

Please note that the Taxonomy (list of document types and associated fields) is editable using the Taxonomy Manager wizard (wizard ribbon after the IntelligentOCR package is installed).

To run the workflow, you must add your own ApiKey for Invoices from https://platform.uipath.com.

DocumentProcessing_IntelligentOCR300.zip (956.5 KB)

Useful resources:

Looking forward to hearing your feedback!

Ioana

14 Likes

@Ioana_Gligan

Can we process pdf with more than 2 pages? because i found that some documents contains more than 2 pages. can we process all the pages ? using this extractor

1 Like

Not using the community edition. The community edition is limited to documents of maximum two pages.

1 Like

Thank you @Ioana_Gligan Ioana_Gligan
Do you have one of this with the “Extract semi-structured document” activity? I can’t make it works…

1 Like

Can we make the model learning using the classifiers and learning activities mentioned in your workflow?
Like if trains one time and next time for the same file it should be giving actual outputs(Trained + Original)
@Ioana_Gligan

2 Likes

Hello @Roboz,

Yes, model training / retraining can be enabled using the Train Classifiers Scope and Train Extractors Scope. Make sure to check out the documentation on how to write your own training activity for an extractor or classifier.

THanks,

Ioana

2 Likes

Hello @Ioana_Gligan,
Where can we have the documentation on how to train and retrain?

Thanks,
Vincent

2 Likes

Hey @Ioana_Gligan

For training extractors, do we need to write our own extractor activities? Why I asked is because I don’t see a trainable extractor activity.
Machine learning extractor is already trained for a set of fields. Regex we got to code. What are the activities that we can train under train extractor scope?

2 Likes

Hi,

I m trying to use execute the workflow which u have attached.
But it is showing me unresolved Activity after installing Intelligent OCR activities.
i.e after Load Taxonomy…
kindly suggest which other package needs to be installed along with Intelligent OCR activity.

1 Like

Hi @Abhishek14

Could you provide a screenshot?
Normally you can right click on the broken dependency and select “Repair” for it to be corrected.

You should also remember to open the project from the project.json file, which will read all the dependencies required to run it and download them automatically.

1 Like

Gone through all the errors,Installed all the packages required
But I am unable to find the Machine Learning Extractor Package in Manage Activities?

1 Like

Hey @Abhishek14

Get the beta feed added to your package manager and search for that in that feed. Also enable pre release option and you’ll find it there :slightly_smiling_face:

3 Likes

Hello @Lahiru.Fernando and @vmariejeanne,

In order to train extractors, you currently have to build your own :slight_smile:

The machine learning extractor is pre-trained and does not expose the re-training capability at this moment.

If you have an in-house algorithm capable of learning, it is very easy to enable the feedback loop, but you do have to write your own training activity.

We will keep you posted when any out of the box options appear.

Thanks,

ioana

5 Likes

Hello, @Ioana_Gligan!
Api Key taken from orchestrator? If I only have Attended license of robot, without orchestrator?

1 Like

Hi @Foertsch

To get the API key, please navigate to the Licenses tab of your Cloud Account (not your Orchestrator instance):

2 Likes

Hello, @loginerror
Thanks for help! :slight_smile:
Can i ask, Taxonomy editor and Keyword Based Classifier support cyrillic?
Let me make it simple: Intelligent OCR activities support cyrillic? :slight_smile:
Thanks.

1 Like

Hello @Foertsch,

IntelligentOCR is language agnostic. You can define documen ttypes in cyrillic using the txonomy manager, they should be properly displayed in all wizards and in the validation station, keyword based classifier is language and alphabet agnostic… as long as it’s representable in UTF-8.

I have to warn you though that DIgitize Document is optimized for left to right top to bottom writing, and works best for latin languages… We will be optimizing this for other languages / alphabets as well.

Ioana

2 Likes

I tried using the IntelligentOCR Package… Trained 5 invoices… is there a way to remove Present validation Step after training for a few times? Can I re-use the learning file without using Validation step for new invoices with same format?

@loginerror
@loana_Gligan
@alexcabuz

1 Like

Hello!
How to create file .json for activity Keyword Based Classifier? Is it created from UiPath itenface, like a taxonomy file from taxonomy editor?
Thanks.

1 Like

I’m sorry, i found solution for my question:
" The activity does not automatically create a file at the specified location. A best practice is to create an empty .JSON file at that location."

2 Likes