Error while doing OCR multiple documents using Parallel For Each

I am trying to Digitize document using OCR for more than 20 documents using parallel for each since Action center doesnt work in normal for each.
While Digitizing document I am getting Unexpected Error when documents are more, may I know what is the limitation of using ocr in parallel or this error is due to some other reason. I have tried with Uipath ocr, google, ocr, omnipage and Tesseract, all throwing same error.

Please find below the error message:
RemoteException wrapping System.Exception: An unexpected error has occurred —> RemoteException wrapping System.AggregateException: One or more errors occurred. —> RemoteException wrapping UiPath.SmartData.Digitization.Tokenization.TokenizationException: 394_CUCF00048_394_3216649_AM3.pdf_Page_1 —> RemoteException wrapping System.Exception: Error performing OCR: OperationCancelled —> RemoteException wrapping System.ServiceModel.FaultException: OperationCancelled
at System.ServiceModel.Channels.ServiceChannel.HandleReply(ProxyOperationRuntime operation,
ProxyRpc& rpc)
at System.ServiceModel.Channels.ServiceChannel.EndCall(String action,
Object outs,
IAsyncResult result)
at System.ServiceModel.Channels.ServiceChannelProxy.TaskCreator.<>c__DisplayClass7_01.<CreateGenericTask>b__0(IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory1.FromAsyncCoreLogic(IAsyncResult iar,
Func2 endFunction, Action1 endAction,
Task1 promise, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at UiPath.Vision.VisionClient.<ScrapeAsync>d__5.MoveNext() --- End of inner exception stack trace --- at UiPath.OCR.Activities.UiPathDocumentOCR.EndExecute(AsyncCodeActivityContext context, IAsyncResult result) at System.Activities.AsyncCodeActivity1.System.Activities.IAsyncCodeActivity.FinishExecution(AsyncCodeActivityContext context,
IAsyncResult result)
at System.Activities.AsyncCodeActivity.CompleteAsyncCodeActivityData.CompleteAsyncCodeActivityWorkItem.Execute(ActivityExecutor executor,
BookmarkManager bookmarkManager)
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.IntelligentOCR.Digitization.ExtendedOcrEngineActivityWrapper.d__2.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.Digitization.OCR.OcrTokenizer.d__7.MoveNext()
— End of inner exception stack trace —
at UiPath.SmartData.Digitization.OCR.OcrTokenizer.d__7.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.Digitization.PDF.PdfTokenizer.d__13.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.Digitization.PDF.PdfTokenizer.d__11.MoveNext()
— End of inner exception stack trace —
at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
at System.Threading.Tasks.Task1.GetResultCore(Boolean waitCompletionNotification) at System.Threading.Tasks.Task1.get_Result()
at UiPath.SmartData.Digitization.ContentTokenizer.<>c__DisplayClass8_0.b__0(Task1 t) at System.Threading.Tasks.ContinuationResultTaskFromResultTask2.InnerInvoke()
at System.Threading.Tasks.Task.Execute()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.Digitization.DocumentDigitizer.<>c__DisplayClass16_0`1.<g__AttachContinuation|0>d.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.Digitization.DocumentDigitizer.d__14.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.IntelligentOCR.Activities.Digitization.DigitizeDocument.d__34.MoveNext()
— End of inner exception stack trace —
at UiPath.Shared.Activities.AsyncTaskNativeImplementation.BookmarkResumptionCallback(NativeActivityContext context,
Object value)
at UiPath.Shared.Activities.AsyncTaskNativeActivity.BookmarkResumptionCallback(NativeActivityContext context,
Bookmark bookmark,
Object value)
at System.Activities.Runtime.BookmarkCallbackWrapper.Invoke(NativeActivityContext context,
Bookmark bookmark,
Object value)
at System.Activities.Runtime.BookmarkWorkItem.Execute(ActivityExecutor executor,
BookmarkManager bookmarkManager)

Hello @anto.santhosh,

The issue is indeed with the digitization - and is usually due to the fact that you run out of memory. Because of this, you might get different exceptions, that can occur at different processing stages, all due to the fact that no more memory allocation can happen.

Do you have any chance to try this out on a machine with more available memory, to see if the number of documents “before the crash” also increases?

We are thinking about ways to automatically handle parallel digitizations or at least have a more graceful failure - it is a work in progress.

Hope this helps,

Ioana

1 Like

I am using Ryzen 5 pro processor with 16GB ram in my Laptop.
May I know what might be the better configuration to run for more than 20 docs in parallel?

I guess it depends on your documents :slight_smile: how long they are, how many of them need OCR… It is not a CPU issue, or I hvaen’t seen any on that direction - it is a memory issue alone.

yes, I just crashed my process as well by forceapplying OCR on a set of 70 documents, even if they are one pagers . You will see in Task Manager that many Vision processes get created, and how memory goes up…

Until we do find a way to limit the number of parallel OCR processes happening within a workflow, I recommend trying to either digitize the documents in a For Each (and store your results until you need to use them, all of them are serializable), or try to process one doc in one process run…

Sorry I don’t have any better suggestions at the moment.

2 Likes

After changing Degree of Parallelism to 1, OCR working fine with more documents without memory error.

However I get error in next stage which is intelligent keyword based classifier.
Please let me know what this error means.

RemoteException wrapping System.InvalidOperationException: The activity ‘Intelligent Keyword Classifier’ with ID 257 threw or propagated an exception while being canceled. —> RemoteException wrapping System.NullReferenceException: Object reference not set to an instance of an object.
at System.Collections.Generic.Dictionary2.Insert(TKey key, TValue value, Boolean add) at UiPath.SmartData.DocumentClassification.Utils.KeywordUtils.<FilterKeywords>d__11.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at UiPath.SmartData.DocumentClassification.Classification.KeywordVectorClassification.VectorSpaceClassifier.<FindDocumentTypes>d__15.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at UiPath.SmartData.DocumentClassification.Classification.KeywordVectorClassification.VectorSpaceClassifier.<FindDocumentType>d__14.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at UiPath.SmartData.DocumentClassification.Classification.KeywordVectorClassification.VectorSpaceClassifier.<GetBestPageMatchAsFirstPage>d__13.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at UiPath.SmartData.DocumentClassification.Classification.KeywordVectorClassification.VectorSpaceClassifier.<ExtractDocumentTypes>d__9.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task) at UiPath.IntelligentOCR.Activities.DocumentClassification.IntelligentKeywordClassifier.<ExecuteAsync>d__20.MoveNext() --- End of stack trace from previous location where exception was thrown --- at UiPath.Shared.Activities.AsyncTaskCodeActivityImplementation.EndExecute(AsyncCodeActivityContext context, IAsyncResult result) at UiPath.IntelligentOCR.Activities.DocumentClassification.ClassifierAsyncTaskActivity.EndExecute(AsyncCodeActivityContext context, IAsyncResult result) at System.Activities.AsyncCodeActivity.System.Activities.IAsyncCodeActivity.FinishExecution(AsyncCodeActivityContext context, IAsyncResult result) at System.Activities.AsyncCodeActivity.CompleteAsyncCodeActivityData.CompleteAsyncCodeActivityWorkItem.Execute(ActivityExecutor executor, BookmarkManager bookmarkManager) --- End of inner exception stack trace --- at System.Collections.Generic.Dictionary2.Insert(TKey key, TValue value, Boolean add)
at UiPath.SmartData.DocumentClassification.Utils.KeywordUtils.d__11.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.DocumentClassification.Classification.KeywordVectorClassification.VectorSpaceClassifier.d__15.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.DocumentClassification.Classification.KeywordVectorClassification.VectorSpaceClassifier.d__14.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.DocumentClassification.Classification.KeywordVectorClassification.VectorSpaceClassifier.d__13.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UiPath.SmartData.DocumentClassification.Classification.KeywordVectorClassification.VectorSpaceClassifier.d__9.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
at UiPath.IntelligentOCR.Activities.DocumentClassification.IntelligentKeywordClassifier.d__20.MoveNext()
— End of stack trace from previous location where exception was thrown —
at UiPath.Shared.Activities.AsyncTaskCodeActivityImplementation.EndExecute(AsyncCodeActivityContext context, IAsyncResult result)
at UiPath.IntelligentOCR.Activities.DocumentClassification.ClassifierAsyncTaskActivity.EndExecute(AsyncCodeActivityContext context, IAsyncResult result)
at System.Activities.AsyncCodeActivity.System.Activities.IAsyncCodeActivity.FinishExecution(AsyncCodeActivityContext context, IAsyncResult result)
at System.Activities.AsyncCodeActivity.CompleteAsyncCodeActivityData.CompleteAsyncCodeActivityWorkItem.Execute(ActivityExecutor executor, BookmarkManager bookmarkManager)

If I try to run again, I get error in ML extractor instead of Keyword classifier.

RemoteException wrapping System.InvalidOperationException: The activity ‘Machine Learning Extractor’ with ID 340 threw or propagated an exception while being canceled. —> RemoteException wrapping System.AggregateException: One or more errors occurred. —> RemoteException wrapping System.Threading.Tasks.TaskCanceledException: A task was canceled.

--- End of inner exception stack trace ---

at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
at System.Threading.Tasks.Task1.GetResultCore(Boolean waitCompletionNotification) at System.Threading.Tasks.Task1.get_Result()
at UiPath.Shared.Activities.AsyncTaskCodeActivityImplementation.EndExecute(AsyncCodeActivityContext context, IAsyncResult result)
at UiPath.DocumentUnderstanding.ML.Activities.ExtractorAsyncTaskActivity.EndExecute(AsyncCodeActivityContext context, IAsyncResult result)
at System.Activities.AsyncCodeActivity.System.Activities.IAsyncCodeActivity.FinishExecution(AsyncCodeActivityContext context, IAsyncResult result)
at System.Activities.AsyncCodeActivity.CompleteAsyncCodeActivityData.CompleteAsyncCodeActivityWorkItem.Execute(ActivityExecutor executor, BookmarkManager bookmarkManager)
— End of inner exception stack trace —

how many files are you processing in parallel?

You might still be running out of memory at these stages (or any other in the parallel for each…)…

if you do need to process a lot of files in the same process, maybe you can do the processing upfront in a for each, store the data, and then just use the wait for in the parallel for each… I know it is not the ideal solution, but might help to some extent.

Hi Ioana,

I’m running into the same issue. Could you please provide an example of a workflow that does the processing upfront in a for loop and uses a wait for in a parallel for each?