How to use the IntelligentOCR Package

The IntelligentOCR package allows users to perform document processing in their workflows, with some out-of-the-box functionality available for usage, as well as the framework required for building your own document classification and data extraction components.

:exclamation: To run the workflows, you must add your own ApiKey for Document Understanding from https://platform.uipath.com.

EDIT1: Action Center Integration Sample
Here is a sample workflow that uses an end to end document understanding processing workflow, and uses Action Center Integration for Human Validation:
SampleDUActionCenterIntegration-New.zip (620.8 KB)

ORIGINAL POSTING: Sample Document Understanding Basic Usage
Here is a sample workflow that performs:

  • digitization, using the OmniPage OCR engine available in UiPath,
  • document classification, using the Keyword Based Classifier,
  • data extraction, using both the Regex Based Extractor as well as the Machine Learning Extractor available for processing Invoices and Receipts
  • data validation, using the Present Validation Station attended activity, and
  • classifier training, for the Keyword Based Classifier.

Please note that the Taxonomy (list of document types and associated fields) is editable using the Taxonomy Manager wizard (wizard ribbon after the IntelligentOCR package is installed).
DocumentProcessing_IntelligentOCR300.zip (956.5 KB)

USEFUL RESOURCES:

Looking forward to hearing your feedback!

Ioana

26 Likes
How to extract invoice data from PDF's?
Intelligent OCR - Machine Learning Extractor
Handwritten scanned pdf data extraction
Your XAML Solutions for most common issues (WIKI POST, anyone can edit)
Hello Everyone, I have an use case of reading a table from multiple pages an invoice pdf. Is there any sample workflows that i can look into using Intelligent OCR ? Thanks in advance
Error when using IntelligentOCR scope
Reading PDF through split
Screen Scraping multiple PDFs in a ForEach loop
Extracting data through pdf using ocr and store in pdf uipath
Leer Casilla de PDF
Receipt and Invoice AI - Now available in Public Preview!
Mapping single vendor with multiple invoice
How to read the PDF data from more than one pages
UI Path + Flexicapture connector
Invoice and Receipts AL
Supervised learning: How to persist data definition with activity UiPath.IntelligentOCR.Activities.ValidationStation.PresentValidationStation
How to Extract PDF particular columns data from Table?
Scrape Text from Scanned PDF
Extracting Data from multiple pdf and different format
How Extract Table from PDF?
Get only the first occurrence in intelligence OCR
Receipt and Invoice AI - Now available in Public Preview!
IntelligentOCR Activities
How to Read a Table data from PDF and store in Excel or Word?
Handling landscape view in a pdf and extracting data from a pdf
Extract data from the scanned document
Invoice data extraction using intelligent ocr activities
How to extract data from multiple pdf
Multiple pdf specific data extraction and store to csv
Extracting Data from PDF Pages
Pdf data extractin
Any demo video/tutorial available for Extract Semi-Structured Document Activity?
Execution suspended - IntelligentOCR Package
Intelligent OCR Error
Receive an error when you try to access the AI links from the browser
Intelligent OCR - How to use / Tuto
Data Extraction Scope: "Sequence contains No elements"
PDF - Invoice Data extraction only of product name and Quantity
OCR works slow
Receipt and Invoice AI - Now available in Public Preview!
Extracted item table from the invoice using intelligent ocr
Any demo video/tutorial available for Extract Semi-Structured Document Activity?
How do I extract the line items?
How to view all invoice properties without opening Validation Station
Taxonomy Manager Blank
Intelligent OCR Regex Based Extractor Not Returning Values
Does the machine learning api still work
Build Taxonomy error
Document Processing 20.4 Beta: Human-Robot Interaction using Action Center
Document Processing 20.4 Beta: Human-Robot Interaction using Action Center
PDF - Invoice Data extraction only of product name and Quantity
How to Iterate Sheets in PDF file to extract Invoice Numbers and Dates

@Ioana_Gligan

Can we process pdf with more than 2 pages? because i found that some documents contains more than 2 pages. can we process all the pages ? using this extractor

1 Like

Not using the community edition. The community edition is limited to documents of maximum two pages.

1 Like

Thank you @Ioana_Gligan Ioana_Gligan
Do you have one of this with the “Extract semi-structured document” activity? I can’t make it works…

2 Likes

Can we make the model learning using the classifiers and learning activities mentioned in your workflow?
Like if trains one time and next time for the same file it should be giving actual outputs(Trained + Original)
@Ioana_Gligan

3 Likes

Hello @Roboz,

Yes, model training / retraining can be enabled using the Train Classifiers Scope and Train Extractors Scope. Make sure to check out the documentation on how to write your own training activity for an extractor or classifier.

THanks,

Ioana

2 Likes

Hello @Ioana_Gligan,
Where can we have the documentation on how to train and retrain?

Thanks,
Vincent

3 Likes

Hey @Ioana_Gligan

For training extractors, do we need to write our own extractor activities? Why I asked is because I don’t see a trainable extractor activity.
Machine learning extractor is already trained for a set of fields. Regex we got to code. What are the activities that we can train under train extractor scope?

2 Likes

Hi,

I m trying to use execute the workflow which u have attached.
But it is showing me unresolved Activity after installing Intelligent OCR activities.
i.e after Load Taxonomy…
kindly suggest which other package needs to be installed along with Intelligent OCR activity.

1 Like

Hi @Abhishek14

Could you provide a screenshot?
Normally you can right click on the broken dependency and select “Repair” for it to be corrected.

You should also remember to open the project from the project.json file, which will read all the dependencies required to run it and download them automatically.

1 Like

Gone through all the errors,Installed all the packages required
But I am unable to find the Machine Learning Extractor Package in Manage Activities?

1 Like

Hey @Abhishek14

Get the beta feed added to your package manager and search for that in that feed. Also enable pre release option and you’ll find it there :slightly_smiling_face:

3 Likes

Hello @Lahiru.Fernando and @vmariejeanne,

In order to train extractors, you currently have to build your own :slight_smile:

The machine learning extractor is pre-trained and does not expose the re-training capability at this moment.

If you have an in-house algorithm capable of learning, it is very easy to enable the feedback loop, but you do have to write your own training activity.

We will keep you posted when any out of the box options appear.

Thanks,

ioana

5 Likes

Hello, @Ioana_Gligan!
Api Key taken from orchestrator? If I only have Attended license of robot, without orchestrator?

1 Like

Hi @Foertsch

To get the API key, please navigate to the Licenses tab of your Cloud Account (not your Orchestrator instance):

2 Likes

Hello, @loginerror
Thanks for help! :slight_smile:
Can i ask, Taxonomy editor and Keyword Based Classifier support cyrillic?
Let me make it simple: Intelligent OCR activities support cyrillic? :slight_smile:
Thanks.

1 Like

Hello @Foertsch,

IntelligentOCR is language agnostic. You can define documen ttypes in cyrillic using the txonomy manager, they should be properly displayed in all wizards and in the validation station, keyword based classifier is language and alphabet agnostic… as long as it’s representable in UTF-8.

I have to warn you though that DIgitize Document is optimized for left to right top to bottom writing, and works best for latin languages… We will be optimizing this for other languages / alphabets as well.

Ioana

2 Likes

I tried using the IntelligentOCR Package… Trained 5 invoices… is there a way to remove Present validation Step after training for a few times? Can I re-use the learning file without using Validation step for new invoices with same format?

@loginerror
@loana_Gligan
@alexcabuz

1 Like

Hello!
How to create file .json for activity Keyword Based Classifier? Is it created from UiPath itenface, like a taxonomy file from taxonomy editor?
Thanks.

1 Like

I’m sorry, i found solution for my question:
" The activity does not automatically create a file at the specified location. A best practice is to create an empty .JSON file at that location."

2 Likes