How to speed up OCR and digitize document part

I have created a workflow for the invoice AI processing stuff and am trying to get it ready for our AP folks using. Everything seems to work pretty well, but there are certain invoices that really slow down the workflow to a point where it is too long of a wait without some kind of loading screen to show the user. Some PDF invoices have a second page with a bunch of small text terms and stuff, which I have found slows things down exponentially. Most invoices only take 5-10 seconds. This workflow will be an attended automation that the AP folks run to read and process all the invoices in a specified folder.

Is there a way to make this process faster for the end-user?

@mtu
Have you increased the scale of the OCR?

The scale was not the problem. PDFs with large amounts of small text were slowing down the OCR process in many cases. I ended up using Orchestrator Queues and Transactions after getting help from UiPath. I created one workflow that processes all invoices in a folder to do the OCR and Machine Learning parts in an unattended job, then places the extracted results into a queue. A second workflow then gets those transactions from the queue and allows the user to quickly validate each extracted result. This was the only decent solution to speed up the ML invoice processing.

2 Likes

OkOk, I just asked because the more scale OCR have, the more slow will be the process.
Well done.

Hey @mtu!
How you passed the extraction results in the queue for be able to use the validation station in the second workflow?

You need to convert the object into a JSON string using Newtonsoft.Json library in order to send it to the queue:

JsonConvert.SerializeObject(extractionResults)

Then deserialize in the other workflow to turn the object back into an actual ExtrationResult object:

JsonConvert.DeserializeObject(of ExtractionResult)(jsonExtractionResult)

2 Likes

Thank you very much!!! :smiley:

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.