In AI center, I have been trying to Train my ML model third time, I have Added new set of documents & done the data labelling and created dataset & exported the same after that I have created the pipeline that also completed successfully & trying to deploy the ML model and update the ML skill, but my ML model package is still in Deploying status only pas few days (since last Friday 1st March), status is not getting changed to Failed to identify the issue, can anyone please help me to figure out what’s wrong with ML package deployment? will it take longer time to deploy the package?
I can see the logs like ML package validation started and validation success, but package is still under deploying status and for pipeline also I can see the logs pipeline run successful and under ML skill I dont see any logs for the latest deployment but in ML skill one tab is there streaming log there I can see one message
“968357b9-fad6-41db-aa8b-a73934368ed2-21-4-6c85b9c5f9-zjzlq:Warning ==> 0/3 nodes are available: 1 Insufficient memory, 1 node(s) had untolerated taint {nvidia.com/gpu: present}, 1 node(s) had untolerated taint {task.mining/cpu: present}. preemption: 0/3 nodes are available: 1 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.”
I can see the log message for pipe line and ML package but for ML skill I can see “MLSkill ICC_Billing_Invoice MLPackage v#23.10.0 Deployment Started” and “MLSkill ICC_Billing_Invoice MLPackage v#23.10.0 Deployment Failed Attempt: 1” with below error message
“968357b9-fad6-41db-aa8b-a73934368ed2-21-4-6c85b9c5f9-zjzlq:Warning ==> 0/3 nodes are available: 1 Insufficient memory, 1 node(s) had untolerated taint {nvidia.com/gpu: present}, 1 node(s) had untolerated taint {task.mining/cpu: present}. preemption: 0/3 nodes are available: 1 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.”