How to deal with OCR Digitize Document issue

I am creating a workflow with the new receipt/invoice ML stuff, but I am running into issues with the digitize document activity and certain PDFs. The whole workflow works great for most PDFs, but some fail when trying to return the PDF document text. I have tried switching out the OCR engines to use Abby, Microsoft, and Tesseract, but there are always errors sometimes.

Is there some way that I need to be catching errors here for documents that cannot be read using OCR or is this some deeper problem?

I can see one related question here, but I am trying to add more detail in case it helps find a good answer.

I would appreciate any help! Thanks!

Here is the full error message and a screenshot of the error:

RemoteException wrapping System.Exception: An unexpected error has occurred —> RemoteException wrapping System.AggregateException: One or more errors occurred. —>
RemoteException wrapping UiPath.SmartData.Digitization.Tokenization.TokenizationException: 1845.pdf —>
RemoteException wrapping System.Exception: Error performing OCR: Unable to initialize Microsoft engine MicrosoftErrorCreateEngine —>
RemoteException wrapping System.ServiceModel.FaultException: Unable to initialize Microsoft engine MicrosoftErrorCreateEngine
at System.ServiceModel.Channels.ServiceChannel.HandleReply(ProxyOperationRuntime operation, ProxyRpc& rpc)
at System.ServiceModel.Channels.ServiceChannel.EndCall(String action, Object outs, IAsyncResult result)
at System.ServiceModel.Channels.ServiceChannelProxy.TaskCreator.<>c__DisplayClass7_01.<CreateGenericTask>b__0(IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory1.FromAsyncCoreLogic(IAsyncResult iar, Func2 endFunction, Action1 endAction, Task1 promise, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at UiPath.Vision.VisionClient.<ScrapeAsync>d__5.MoveNext() --- End of inner exception stack trace --- at UiPath.Vision.VisionClient.<ScrapeAsync>d__5.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at UiPath.Vision.UiImage.<ScrapeOCRAsync>d__26.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at UiPath.Core.Activities.OCREngineActivity.<ExecuteAsync>d__49.MoveNext() --- End of stack trace from previous location where exception was thrown --- at UiPath.Shared.Activities.AsyncTaskCodeActivityImplementation.EndExecute(AsyncCodeActivityContext context, IAsyncResult result) at UiPath.Shared.Activities.AsyncTaskCodeActivity1.EndExecute(AsyncCodeActivityContext context, IAsyncResult result)
at System.Activities.AsyncCodeActivity1.System.Activities.IAsyncCodeActivity.FinishExecution(AsyncCodeActivityContext context, IAsyncResult result) at System.Activities.AsyncCodeActivity.CompleteAsyncCodeActivityData.CompleteAsyncCodeActivityWorkItem.Execute(ActivityExecutor executor, BookmarkManager bookmarkManager) --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at UiPath.IntelligentOCR.Digitization.ExtendedOcrEngineActivityWrapper.<Execute>d__2.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at UiPath.SmartData.Digitization.OCR.OcrTokenizer.<TokenizeAsync>d__12.MoveNext() --- End of inner exception stack trace --- at UiPath.SmartData.Digitization.OCR.OcrTokenizer.<TokenizeAsync>d__12.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at UiPath.SmartData.Digitization.PDF.PdfTokenizer.<TokenizePageImage>d__15.MoveNext() --- End of inner exception stack trace --- at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions) at System.Threading.Tasks.Task1.GetResultCore(Boolean waitCompletionNotification)
at System.Threading.Tasks.Task1.get_Result() at UiPath.SmartData.Digitization.ContentTokenizer.<>c__DisplayClass12_0.<TokenizePage>b__0(Task1 t)
at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
at System.Threading.Tasks.Task.Execute()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.Digitization.DocumentDigitizer.d__14.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.IntelligentOCR.Activities.Digitization.DigitizeDocument.d__35.MoveNext()
— End of inner exception stack trace —
at UiPath.Shared.Activities.AsyncTaskNativeImplementation.BookmarkResumptionCallback(NativeActivityContext context, Object value)
at UiPath.Shared.Activities.AsyncTaskNativeActivity.BookmarkResumptionCallback(NativeActivityContext context, Bookmark bookmark, Object value)
at System.Activities.Runtime.BookmarkCallbackWrapper.Invoke(NativeActivityContext context, Bookmark bookmark, Object value)
at System.Activities.Runtime.BookmarkWorkItem.Execute(ActivityExecutor executor, BookmarkM

I ended up working with UiPath support as well as a ML engineer on this issue and found a work-around and a fix. I am on a Windows 7 machine, which has problems with the newest IntelligentOCR packages and the OCR engines according to support.

There is an alternate OCR Engine that the ML engineer point out for me which seems to work for all the invoices that I was having problems with. UiPath.OmniPage.Activities. This package allowed the use of the OnminPage OCR Engine. I found this work-around to be enough to get me working again.

The other option was to upgrade my machine to Windows 10 since certain parts needed for the OCR engines was not supported in Windows 7 any longer. I will be upgrading to Windows 10 soon so I will see if that fixes the issues. Here is the support summary:

Issue Description:
Digitize Document activity throws an error for some PDFs.

Resolution:
Please consider upgrading to Windows 10 where Microsoft OCR module is pre-installed.
Also, please contact UiPath forum for assistance until the packages are officially supported.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.