Intelligent OCR - How to use / Tuto

Hello guys,
I’m looking for a robot able to categorise PDF (which can be multi pages) by a specific format, then extract the data depending on this particular format and also if format not found, to perform machine learning on this new format to then save this format for new future scan doc.

I’ve heard and try to use Intelligent OCR package, but quite difficult to use without any knowledge.

Can some one help me? Any tuto? of document how to use it?


Hi @vmariejeanne

There are indeed some intelligent possibilities, but I am not sure how well these will suit you so far.

Please go over this category first:

It contains a bunch of articles about the usage of these new activity packs.

Then, there is a bunch of examples and documentation on the IntelligentOCR package, see this post as an example:

I believe it might not yet be possible to auto-correct based on the input data, but you can for sure set up your files to be processed based on the provided templates.


Hello @vmariejeanne

Please have a look over this updated sample workflow here:

It might help you get on the right track for your use case!


1 Like

Hello @loana_Gligan and @loginerror ,
I’ve seen the workflow and its great, but i’m wondering about something; there is a validity station, does this activity use for the machine learning? or it will popup for all invoice scan?

I try to run the robot having same format of invoice in my input folder and each time the validity station popup.

Normally my needs is to perform OCR on all different types of invoices and if the formatting is unknown for the bot, it then perform the validity station to learn this new pattern.
or to save all unkonow type invoice in a seperate folder for later on machione learning by a human.

Thanks for your help.

Hello @vmariejeanne,

You will need to write your own logic around whether to show the validation station or not.

Please note that currently the machine learning extractor does not expose training capabilities for the community edition.

If you use a limited number of invoice formats for now, you can write some basic logic (are all fields I need extracted? any missing? etc), and actually test on those invoice types, and decide if you want to show the validation station or not.

If you have unknown invoice formats, then I strongly recommend to discuss with the business to decide if accuracy is critical.

If some errors are allowed, then I would test for the fact that certain important fields have values, and if they have, then not show the validation station. This is for cases when it’s okay to have a certain degree of mistakes, as it would be sometimes (few cases) cheaper to correct those cases than to validate all cases.
If no errors are allowed, then I would ALWAYS show the validation station. This is because even if the extrctor might return the right value (from the right place), there might be OCR issues that you will not see (like a zero identified as letter O), but you will want to correct.

Hope this helps,