How to speed up OCR and digitize document part

mtu · February 10, 2020, 9:26pm

I have created a workflow for the invoice AI processing stuff and am trying to get it ready for our AP folks using. Everything seems to work pretty well, but there are certain invoices that really slow down the workflow to a point where it is too long of a wait without some kind of loading screen to show the user. Some PDF invoices have a second page with a bunch of small text terms and stuff, which I have found slows things down exponentially. Most invoices only take 5-10 seconds. This workflow will be an attended automation that the AP folks run to read and process all the invoices in a specified folder.

Is there a way to make this process faster for the end-user?

mmcruzRPA · February 24, 2020, 9:51am

@mtu
Have you increased the scale of the OCR?

mtu · February 24, 2020, 4:13pm

The scale was not the problem. PDFs with large amounts of small text were slowing down the OCR process in many cases. I ended up using Orchestrator Queues and Transactions after getting help from UiPath. I created one workflow that processes all invoices in a folder to do the OCR and Machine Learning parts in an unattended job, then places the extracted results into a queue. A second workflow then gets those transactions from the queue and allows the user to quickly validate each extracted result. This was the only decent solution to speed up the ML invoice processing.

mmcruzRPA · February 24, 2020, 4:14pm

OkOk, I just asked because the more scale OCR have, the more slow will be the process.
Well done.

mmcruzRPA · February 26, 2020, 12:04pm

Hey @mtu!
How you passed the extraction results in the queue for be able to use the validation station in the second workflow?

mtu · February 26, 2020, 1:13pm

You need to convert the object into a JSON string using Newtonsoft.Json library in order to send it to the queue:

JsonConvert.SerializeObject(extractionResults)

Then deserialize in the other workflow to turn the object back into an actual ExtrationResult object:

JsonConvert.DeserializeObject(of ExtractionResult)(jsonExtractionResult)

mmcruzRPA · February 26, 2020, 2:13pm

Thank you very much!!!

system · February 29, 2020, 2:13pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Help with Invoice AI setting up separate workflows for extraction Document Understanding uiautomation , studio , question	3	1411	June 21, 2020
OCR works slow Learning Hub	3	2573	April 20, 2020
OCR Document Processing Improvements Help studio	1	594	October 1, 2020
Document understanding OCR and Data extraction takes too much time Something Else ocr , activities , feedback , document_understanding , pdf-extraction	2	29	July 1, 2025
Document OCR performance Document Understanding	4	1206	December 30, 2021

How to speed up OCR and digitize document part

Related topics