Failed Trained Pipeline Activity-out of memory exception

Hi Fellow UIPath Developers, I need urgent assistance with getting to a solution to training our current pipeline.

we serve quite a large dataset for our client and thus process more than 1500 document types a day. Our train pipeline has been running successfully to date albeit the pipeline is taking significantly longer periods to complete its training.

We have now encountered error message: Pipeline failed due to Infra Issue,

Log Message Reads: Deployment failed with out of memory exception
… 31400
… 31500

I have now attempted to retrain the model but it takes even longer to run through the pipeline and i get the same “Deployment Failed with out of memory exception”

After re-attempting the failed Job it returns error Message Below: Unknown error occurred in Job Deployment for jobName: f0a7bebe-6928-44cf-97c5-b2506556a859

Could I get insight and advice on what could be the issue? i have also looked at our dataset and the data from our document extracted text is OK.

I have also attached log data of the Pipeline run with the failed result and the failed re-attempted run.
Processing: 1-58842a6e-3739-415e-b36f-5ffc2621e1ed_Out of Memory Exception_.log…
Processing: 2-f0a7bebe-6928-44cf-97c5-b2506556a859 Unexpected Error-Re-attempted Run.log…
1-58842a6e-3739-415e-b36f-5ffc2621e1ed_Out of Memory Exception.txt (4.6 KB)
2-f0a7bebe-6928-44cf-97c5-b2506556a859 Unexpected Error-Re-attempted Run.txt (12.6 KB)

Hi @Mgevisa_Khoza

What model are you using for this? Is this one of the OOB model?
Are you trying to train with or without GPU?

Jeremy

Hi @Jeremy_Tederry Thanks for the feedback in such short notice.

We are using an Out Of Box training model: Document Classifier.

image

We have been training and doing evaluation pipeline runs for close to 3 months now using this package successfully.

We are not running a GPU. We are using 4 AI Robots which are CPU limited.

I know that an Ai robot is able to run 2 separate ML skills simultaneously however is it possible to get the AI robots to train the pipeline at the same time as well?

I hope the information provided is adequate for you to provide a possible answer to our problem.

Hi @Mgevisa_Khoza

Are you On prem or on SaaS AI Center?
One AI Robot is either serving skills or training a pipeline it can’t do both at same time but the error that you get is not a license error it’s an hardware (RAM) error o it hsouldn’t be the issue here.

@alexcabuz any idea here?

Hi @Jeremy_Tederry, we using Orchestrator Cloud.

The current set-up is that one robot is doing the pipeline training as it is scheduled to run.