Digitize Document - An unexpected error has occurred

Scenario:

I am running a process which checks invoices in xml and pdf format. If the process does not find the right info in the xml, I am checking the pdf. I am using the new activity “Digitize Document” activity with google OCR. Through development it has been working like a charm. However, now I am deploying the process to the test environment and the “Digitize activity” fails constantly with the message “Digitize Document: An unexpected error has occurred”.
The error message is not very descriptive and I am having trouble finding where the problem lies. Please help :pray:

Steps to reproduce:

Read PDF documents with the digitize document and google OCR

Current Behavior:

Throwing error "Unexpected error has occurred

Expected Behavior:

Reading the pdf with google ocr and outputting the text that was found

Studio/Robot/Orchestrator Version: 2018.4.1

Update: It seems that the Uipath.Vision executeable keeps crashing which causes this acitivty to fail.
Any help on this? @ovi , @loginerror

This is the error that i get:

OCR Failed with error System.Exception at source UiPath.IntelligentOCR.Activities with message An unexpected error has occurred

can you share your xaml file.

Hi, @venkat4u
Unfortunately i cant share the workflow due to security reasons. I will attach screenshots of the activities and properties i am using. I have tried both google ocr and microsoft.

With google, the activity sometimes run really slow and can take almost 15 minutes or more…

The first log message is when the pdf is downloaded and the second is when the activity is finished processing. When it takes this amount of time, the system times out and logs out resulting in a failed transaction.

With microsoft ocr and digitize document the OCR just keeps crashing, after processing 180 invoices, the activity fails 47 times. I have attached filtered logs from orchestratorOrchestrator Log.xlsx (11.8 KB)

This is the activity and properties i am using:

image
image
image

Let me know if you need any other information

I have the same problem. “Digitize Document: An unexpected error has occurred”.
I use the Google Cloud OCR engine and in the log it says the size of the load is too high, above 10mb.
I think 10mb is the limit for the OCR API.
But this is weird because the PDF that fails is small and many larger PDF’s works fine.

My work around is to TryCatch and use UiPath’s ‘Read PDF with OCR’ if it fails.
Digitize Document seems to do a better reading but it is also clearly bugged.

Error:
UiPath.SmartData.Digitization.Tokenization.TokenizationException: test.pdf ----> System.Exception: Error performing OCR: GoogleCloudErrorInvalidResponse Request payload size exceeds the limit: 10485760 bytes.

This may be a really late reply for your question, but I just got the same issue and started experimenting, luckily I solved it by changing the scale property of the OCR engine, I think it has something to do with the image quality of your document. The scale value enlarges the document for better recognition but also keep in mind that this may vary on the document type as it can also lead to misinterpretation of the obvious characters if it is too large like 0 and O, etc.

Cheers! :wink:

2 Likes

Hi Friends i Found a easy solution for above error you just update the Uipath.IntelligentOCR.Activites
and Uipath.MachineLearningExtrator.Activities then take the API key from the Platform.uipath.com Templateless Receipt/Invoice Extractor key and paste in the ML Extractor.

2 Likes

This is really late, but I think that the Digitize Document activity doesn’t allow for scaling in the OCR engine.

I had set scale to 3, and got the same error message. When I removed that and used the default scale for the OCR engine, the error message went away.

Well, in my case the digitizing works like a charm for pdf documents, whereas it shows this ‘Digitize Document : An unexpected error occurred’ message when trying for jpg documents.

it seems that digitize document doesn’t support a scale set to a high value. It can work with value 2, but generate the error with 3.
Start with default value and see if results are satisfying.
Now with intelligent OCR, you can also use the OCR from the server. more precise

Duude. So much testing, but updating the packages did the trick! :joy: