How to use the IntelligentOCR Package

paulo.kurihara · April 29, 2020, 6:50pm

Hi @Ioana_Gligan and @warren_lee,

I’ve been facing the same problem when using the Create Document Validation Action activity.

Unexpected character encountered while parsing value: <. Path ‘’, line 0, position 0.

Have you managed to solve this problem?

Thanks!

Ioana_Gligan · April 30, 2020, 10:51am

hello @paulo.kurihara,

Any chance you could share the failing workflow with me + Studio version? I would like to try to reproduce the issue.

ioana

Ioana_Gligan · April 30, 2020, 10:53am

hello @marton.szaboo, and welcome to the community!

The confidence computation is a complex algorithm that keeps track of which words are found, where they are in the document, when a certain keyword or set of keywords has been added to the learning, and how many times a keyword has been reinforced. That is why it is growing
You will notice that IF the classifier makes a mistake and you correct the document type from the Validaiton Station (if you have more than one doc type in there), then new stuff appears in the learning content.

Hope this helps,

Ioana

paulo.kurihara · April 30, 2020, 3:02pm

Hi @Ioana_Gligan,

Since I’m a new user and I can’t attach files here, I’ve attached the zipped file with the failing workflow on my google drive.

The link is: IntelligentOCR.zip - Google Drive

The Studio version is 2019.4.4, Community edition.

Since it always fails, the ContinueOnError property of the Create Document Validation Action activity is enable, and you will need to unable it in order to see the exception it returns.

Thanks!

OsoDormilon · May 1, 2020, 10:44am

Hello @Iona, thanks for your great work here

I am just starting on this topic of document understanding. I would like ask you about the following that you said:

“The machine learning extractor is pre-trained and does not expose the re-training capability at this moment.”

Is there any out of the box new option? If not, do you have any idea when it will appear?

“In order to train extractors, you currently have to build your own”.

What do you think it would be the best approach to target this? Connect our workflow to an Azure/AWS machine learning instance?

All the best,

OsoDormilon · May 1, 2020, 10:52am

Hi again forum,

I have downloaded UiPath Studio Pro 2020.4.0.beta1731 Community and updated all packages including prereleases, currently no errors there. However, I have a missed activity giving the following error:

Could not find member ‘SkipServerSideOCR’ in type ‘http://schemas.uipath.com/workflow/activities/documentunderstanding-ml:MachineLearningExtractor’. Row: 194, Column: 522

Any idea how to fix this issue?

Thank you in advance to all for your support!

All the best,

Ioana_Gligan · May 1, 2020, 3:33pm

Hello @OsoDormilon,

I updated the archive - that error should go away now… sorry about that!

Ioana

Ioana_Gligan · May 1, 2020, 3:46pm

Hello @paulo.kurihara,

I think the Studio version is the issue - 19.4 will be out of support in a couple of months… why don’t you switch to the preview channel in Studio (main menu / help / right side bar / switch to Preview), or install the latest Community?

Please let me know if this works once you set the persistence flag in project settings, and try it out on the latest version!

Thank you,

Ioana

OsoPolar · May 2, 2020, 4:48pm

Hi,

The project is giving some warnings because deprecated UiPath.MachineLearningExtractor package. I would suggest to update projects available on current link and update packages.

Thanks!

OsoPolar · May 2, 2020, 4:59pm

I referred to DocumentProcessing_IntelligentOCR300 project.

Cheers,

paulo.kurihara · May 4, 2020, 6:50pm

Hi @Ioana_Gligan,

Alright, I’ll give it a shot and let’s see what happens.

Thank you!

paulo.kurihara · May 5, 2020, 8:17pm

Hello @Ioana_Gligan,

I’ve noticed that there’s a property for the Digitize Document activity that gives the possibility to force the activity to read the document with OCR (the ForceApplyOCR property). Now I’m wondering if it is possible to do something like the opposite, which would be to force it to read for example a PDF file, just like the Read PDF Text activity does, because sometimes the result of a PDF read using OCR is not good enough to bring all the information we need to extract, and most of the PDFs I’m working with, don’t really need the OCR, since they have extractable text.

So, basically my doubt is if it is possible to use the Digitize Document as if it was the Read PDF Text, in order to avoid the use of OCR when not needed.

Thank you!

marton.szaboo · May 6, 2020, 1:24pm

Thank you, I’ve played around with it a lot and with your help i understand it.

warren_lee · May 7, 2020, 4:44am

hi Loana, @paulo.kurihara,

Did you manage to get to this issue?

I tried again today and it’s still the same for me…so just wondering if you guys have found anything that might have caused this?

This is the sample workflow i’m using (it has a ML extractor pointing to the receipt ML endpoint, without my API key in here of course)

SampleDUActionCenterIntegration_Forum.zip (648.0 KB)

Ioana_Gligan · May 7, 2020, 2:31pm

Hey @paulo.kurihara,

The Digitize Document activity does not apply OCR by default. If a PDF can be natively read, it is. If a certain page contains too much coverage of images,or does not return text for native reading, or a couple other conditions, only then it applies OCR.

Ioana_Gligan · May 7, 2020, 2:32pm

Hi @warren_lee,

what Studio version are you using? Are you on the preview channel using the latest version?

paulo.kurihara · May 7, 2020, 7:07pm

Hi @warren_lee and @Ioana_Gligan,

My Studio is now the 2020.4.0 version, and it actually works now!

Thank you!

warren_lee · May 8, 2020, 7:40am

Hi Loana and @paulo.kurihara,

I’m also on 2020.4.0 version, and i finally figured out what was the issue!!

It’s interesting because it appears to somehow be linked to the Orchestrator API endpoint on my robot.

So my bot was connected to the orchestrator via the latest community endpoint, this was when i experienced the issue:

https://cloud.uipath.com

What i then tried is dis-connect my bot and re-connect back up using:

https://platform.uipath.com

This too, did not work and produce the same error, BUT >>

I then connect it using:

https://platform.uipath.com/{my specific service}/{my specific tenant}

This full URL appears to resolve the issue, so it makes me think that what’s happening, is that for some reason, either it’s Studio, or specific to the activity, where the service or tenant information is not passed in and it can’t potentially perform the background API operations with my orchestrator, specifically for this activity…

Reason i say specific to this activity, is because my bot has always been able to connect to orchestrator fine, and other orchestrator operation has been working well.

Interesting observation though …

mosima.selota · May 8, 2020, 11:31pm

@OsoDormilon @Ioana_Gligan not sure if there was a response to the train extractor activity for the machine learning? have been looking for the solution and was hoping uipath will be releasing something on this, but have seen anything yet. Any solution how to go about this?

matheus · May 11, 2020, 6:31pm

Hi @Ioana_Gligan,

I have the same problem. My PDF can be read natively perfectly, but there some non-text images like a logo, backgrounds, and due to it, the PDF is always being read as OCR and the result is very messy.

It would be better if there were an option to force extraction as text, is there a way to do it?

Topic		Replies	Views
IntelligentOCR Activities Help ocr	5	1867	October 11, 2019
Hello Everyone, I have an use case of reading a table from multiple pages an invoice pdf. Is there any sample workflows that i can look into using Intelligent OCR ? Thanks in advance Help robot	1	1575	September 21, 2019
Any demo video/tutorial available for Extract Semi-Structured Document Activity? Help studio	18	3592	April 20, 2020
Document Processing 20.4 Beta: Human-Robot Interaction using Action Center Product News news	86	12985	November 16, 2021
Intelligent OCR - How to use / Tuto Help	4	3328	September 13, 2019

Most Active Users - Yesterday
Anil_G
ashokkarale
postwick
Kelly12345
Yoichi
RPA_Dev13
jawagarraja123
SenzoD
anjani_priya
mpearson9526
More details...

How to use the IntelligentOCR Package

Related Topics