Error while doing OCR multiple documents using Parallel For Each

I am trying to Digitize document using OCR for more than 20 documents using parallel for each since Action center doesnt work in normal for each.
While Digitizing document I am getting Unexpected Error when documents are more, may I know what is the limitation of using ocr in parallel or this error is due to some other reason. I have tried with Uipath ocr, google, ocr, omnipage and Tesseract, all throwing same error.

Please find below the error message:
RemoteException wrapping System.Exception: An unexpected error has occurred —> RemoteException wrapping System.AggregateException: One or more errors occurred. —> RemoteException wrapping UiPath.SmartData.Digitization.Tokenization.TokenizationException: 394_CUCF00048_394_3216649_AM3.pdf_Page_1 —> RemoteException wrapping System.Exception: Error performing OCR: OperationCancelled —> RemoteException wrapping System.ServiceModel.FaultException: OperationCancelled
at System.ServiceModel.Channels.ServiceChannel.HandleReply(ProxyOperationRuntime operation,
ProxyRpc& rpc)
at System.ServiceModel.Channels.ServiceChannel.EndCall(String action,
Object outs,
IAsyncResult result)
at System.ServiceModel.Channels.ServiceChannelProxy.TaskCreator.<>c__DisplayClass7_01.<CreateGenericTask>b__0(IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory1.FromAsyncCoreLogic(IAsyncResult iar,
Func2 endFunction, Action1 endAction,
Task1 promise, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at UiPath.Vision.VisionClient.<ScrapeAsync>d__5.MoveNext() --- End of inner exception stack trace --- at UiPath.OCR.Activities.UiPathDocumentOCR.EndExecute(AsyncCodeActivityContext context, IAsyncResult result) at System.Activities.AsyncCodeActivity1.System.Activities.IAsyncCodeActivity.FinishExecution(AsyncCodeActivityContext context,
IAsyncResult result)
at System.Activities.AsyncCodeActivity.CompleteAsyncCodeActivityData.CompleteAsyncCodeActivityWorkItem.Execute(ActivityExecutor executor,
BookmarkManager bookmarkManager)
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.IntelligentOCR.Digitization.ExtendedOcrEngineActivityWrapper.d__2.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.Digitization.OCR.OcrTokenizer.d__7.MoveNext()
— End of inner exception stack trace —
at UiPath.SmartData.Digitization.OCR.OcrTokenizer.d__7.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.Digitization.PDF.PdfTokenizer.d__13.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.Digitization.PDF.PdfTokenizer.d__11.MoveNext()
— End of inner exception stack trace —
at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
at System.Threading.Tasks.Task1.GetResultCore(Boolean waitCompletionNotification) at System.Threading.Tasks.Task1.get_Result()
at UiPath.SmartData.Digitization.ContentTokenizer.<>c__DisplayClass8_0.b__0(Task1 t) at System.Threading.Tasks.ContinuationResultTaskFromResultTask2.InnerInvoke()
at System.Threading.Tasks.Task.Execute()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.Digitization.DocumentDigitizer.<>c__DisplayClass16_0`1.<g__AttachContinuation|0>d.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.Digitization.DocumentDigitizer.d__14.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.IntelligentOCR.Activities.Digitization.DigitizeDocument.d__34.MoveNext()
— End of inner exception stack trace —
at UiPath.Shared.Activities.AsyncTaskNativeImplementation.BookmarkResumptionCallback(NativeActivityContext context,
Object value)
at UiPath.Shared.Activities.AsyncTaskNativeActivity.BookmarkResumptionCallback(NativeActivityContext context,
Bookmark bookmark,
Object value)
at System.Activities.Runtime.BookmarkCallbackWrapper.Invoke(NativeActivityContext context,
Bookmark bookmark,
Object value)
at System.Activities.Runtime.BookmarkWorkItem.Execute(ActivityExecutor executor,
BookmarkManager bookmarkManager)

1 Like

Hello @anto.santhosh,

The issue is indeed with the digitization - and is usually due to the fact that you run out of memory. Because of this, you might get different exceptions, that can occur at different processing stages, all due to the fact that no more memory allocation can happen.

Do you have any chance to try this out on a machine with more available memory, to see if the number of documents “before the crash” also increases?

We are thinking about ways to automatically handle parallel digitizations or at least have a more graceful failure - it is a work in progress.

Hope this helps,

Ioana

2 Likes

I am using Ryzen 5 pro processor with 16GB ram in my Laptop.
May I know what might be the better configuration to run for more than 20 docs in parallel?

I guess it depends on your documents :slight_smile: how long they are, how many of them need OCR… It is not a CPU issue, or I hvaen’t seen any on that direction - it is a memory issue alone.

yes, I just crashed my process as well by forceapplying OCR on a set of 70 documents, even if they are one pagers . You will see in Task Manager that many Vision processes get created, and how memory goes up…

Until we do find a way to limit the number of parallel OCR processes happening within a workflow, I recommend trying to either digitize the documents in a For Each (and store your results until you need to use them, all of them are serializable), or try to process one doc in one process run…

Sorry I don’t have any better suggestions at the moment.

3 Likes

After changing Degree of Parallelism to 1, OCR working fine with more documents without memory error.

However I get error in next stage which is intelligent keyword based classifier.
Please let me know what this error means.

RemoteException wrapping System.InvalidOperationException: The activity ‘Intelligent Keyword Classifier’ with ID 257 threw or propagated an exception while being canceled. —> RemoteException wrapping System.NullReferenceException: Object reference not set to an instance of an object.
at System.Collections.Generic.Dictionary2.Insert(TKey key, TValue value, Boolean add) at UiPath.SmartData.DocumentClassification.Utils.KeywordUtils.<FilterKeywords>d__11.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at UiPath.SmartData.DocumentClassification.Classification.KeywordVectorClassification.VectorSpaceClassifier.<FindDocumentTypes>d__15.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at UiPath.SmartData.DocumentClassification.Classification.KeywordVectorClassification.VectorSpaceClassifier.<FindDocumentType>d__14.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at UiPath.SmartData.DocumentClassification.Classification.KeywordVectorClassification.VectorSpaceClassifier.<GetBestPageMatchAsFirstPage>d__13.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at UiPath.SmartData.DocumentClassification.Classification.KeywordVectorClassification.VectorSpaceClassifier.<ExtractDocumentTypes>d__9.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task) at UiPath.IntelligentOCR.Activities.DocumentClassification.IntelligentKeywordClassifier.<ExecuteAsync>d__20.MoveNext() --- End of stack trace from previous location where exception was thrown --- at UiPath.Shared.Activities.AsyncTaskCodeActivityImplementation.EndExecute(AsyncCodeActivityContext context, IAsyncResult result) at UiPath.IntelligentOCR.Activities.DocumentClassification.ClassifierAsyncTaskActivity.EndExecute(AsyncCodeActivityContext context, IAsyncResult result) at System.Activities.AsyncCodeActivity.System.Activities.IAsyncCodeActivity.FinishExecution(AsyncCodeActivityContext context, IAsyncResult result) at System.Activities.AsyncCodeActivity.CompleteAsyncCodeActivityData.CompleteAsyncCodeActivityWorkItem.Execute(ActivityExecutor executor, BookmarkManager bookmarkManager) --- End of inner exception stack trace --- at System.Collections.Generic.Dictionary2.Insert(TKey key, TValue value, Boolean add)
at UiPath.SmartData.DocumentClassification.Utils.KeywordUtils.d__11.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.DocumentClassification.Classification.KeywordVectorClassification.VectorSpaceClassifier.d__15.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.DocumentClassification.Classification.KeywordVectorClassification.VectorSpaceClassifier.d__14.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.DocumentClassification.Classification.KeywordVectorClassification.VectorSpaceClassifier.d__13.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.DocumentClassification.Classification.KeywordVectorClassification.VectorSpaceClassifier.d__9.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
at UiPath.IntelligentOCR.Activities.DocumentClassification.IntelligentKeywordClassifier.d__20.MoveNext()
— End of stack trace from previous location where exception was thrown —
at UiPath.Shared.Activities.AsyncTaskCodeActivityImplementation.EndExecute(AsyncCodeActivityContext context, IAsyncResult result)
at UiPath.IntelligentOCR.Activities.DocumentClassification.ClassifierAsyncTaskActivity.EndExecute(AsyncCodeActivityContext context, IAsyncResult result)
at System.Activities.AsyncCodeActivity.System.Activities.IAsyncCodeActivity.FinishExecution(AsyncCodeActivityContext context, IAsyncResult result)
at System.Activities.AsyncCodeActivity.CompleteAsyncCodeActivityData.CompleteAsyncCodeActivityWorkItem.Execute(ActivityExecutor executor, BookmarkManager bookmarkManager)

If I try to run again, I get error in ML extractor instead of Keyword classifier.

RemoteException wrapping System.InvalidOperationException: The activity ‘Machine Learning Extractor’ with ID 340 threw or propagated an exception while being canceled. —> RemoteException wrapping System.AggregateException: One or more errors occurred. —> RemoteException wrapping System.Threading.Tasks.TaskCanceledException: A task was canceled.

--- End of inner exception stack trace ---

at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
at System.Threading.Tasks.Task1.GetResultCore(Boolean waitCompletionNotification) at System.Threading.Tasks.Task1.get_Result()
at UiPath.Shared.Activities.AsyncTaskCodeActivityImplementation.EndExecute(AsyncCodeActivityContext context, IAsyncResult result)
at UiPath.DocumentUnderstanding.ML.Activities.ExtractorAsyncTaskActivity.EndExecute(AsyncCodeActivityContext context, IAsyncResult result)
at System.Activities.AsyncCodeActivity.System.Activities.IAsyncCodeActivity.FinishExecution(AsyncCodeActivityContext context, IAsyncResult result)
at System.Activities.AsyncCodeActivity.CompleteAsyncCodeActivityData.CompleteAsyncCodeActivityWorkItem.Execute(ActivityExecutor executor, BookmarkManager bookmarkManager)
— End of inner exception stack trace —

how many files are you processing in parallel?

You might still be running out of memory at these stages (or any other in the parallel for each…)…

if you do need to process a lot of files in the same process, maybe you can do the processing upfront in a for each, store the data, and then just use the wait for in the parallel for each… I know it is not the ideal solution, but might help to some extent.

1 Like

Hi Ioana,

I’m running into the same issue. Could you please provide an example of a workflow that does the processing upfront in a for loop and uses a wait for in a parallel for each?

Hi @Ioana_Gligan,

Looks like this is still an issue, are there new workarounds or fixes available for this issue? I’m currently facing this at a client and would love to get it resolved with minimal code impact is possible.
Any thoughts - @alexcabuz, @Andra_Buica, @Jeremy_Tederry @MVP2021

Regards,
Priya

Hi, @PD2 ,
A possible workaround would be to handle such errors programmatically, in the workflow. A good starting point for this is the Document Understanding Process available in Studio, in the Templates category (see the below image). There is also a user guide inside the process to aid the setup.

Thank you @AdiPopa for the prompt response. I noticed that the new Template, that is in Prerelease has a User guide and there is a section that describes about the Design for the DU processes, is this the new recommended approach

The One-Job-Per-File Approach
Document Understanding processes will not run as batch jobs. Instead, an individual job starts for each file to be processed. This approach is used for both Attended and Unattended implementations.

Since there was no prior documentation on the Design approaches for DU, based on the training provided and help from a few experts we have the Parallel For each for the entire DU steps in the DUFramework, is there a recommended Hardware/Software requirements when using the Parallel For Each loops?