Unable to Run Pipeline: Failure Reason ML_Package_ISSUE

Dear UiPath Community,
I am encountering an issue when trying to run a pipeline in AI Center. The pipeline fails with the error: Pipeline failed due to ML Package Issue. Below is the pipeline log output and the error trace for reference:
Train only of ML_Package_AB 24.4.4.0 launched - Run ce021a9e-5d6a-41d1-9dfe-245f2b87893d
Train only of ML_Package_AB 24.4.4.0 started - Run ce021a9e-5d6a-41d1-9dfe-245f2b87893d
Train only of ML_Package_AB 24.4.4.0 scheduled - Run ce021a9e-5d6a-41d1-9dfe-245f2b87893d
Train only of ML_Package_AB 24.4.4.0 failed - Run ce021a9e-5d6a-41d1-9dfe-245f2b87893d

Error Details : Pipeline failed due to ML Package Issue

20 17:18:40,722 - root:configure_options:199 - INFO: Document type purchase_orders language: en
2024-11-20 17:18:40,722 - root:configure_options:202 - INFO: No fields with section items found, turning off EOL task…
2024-11-20 17:18:40,722 - root:configure_options:218 - INFO: Training with random slices: True
2024-11-20 17:18:40,722 - root:configure_options:219 - INFO: Sample by size: True
2024-11-20 17:18:40,939 - root:init:101 - INFO: Average Mixed Precision Training disabled …
2024-11-20 17:18:40,939 - root:create_processor:56 - INFO: Loading LayoutLMv3 processor from HuggingFace …
2024-11-20 17:18:41,031 - root:init:48 - INFO: Finetuning from base model: multi_task_base…
2024-11-20 17:18:41,031 - root:load:212 - INFO: Loading model…
2024-11-20 17:18:41,539 - root:load:218 - INFO: Model epoch 65
2024-11-20 17:18:41,540 - root:create_network:146 - INFO: Building network …
2024-11-20 17:18:41,540 - root:init:16 - INFO: Creating new network …
2024-11-20 17:18:42,865 - root:init:22 - INFO: Number of parameters: 126610434
2024-11-20 17:18:43,099 - root:load_network:252 - INFO: Missing weights
2024-11-20 17:18:43,099 - root:load_network:253 - INFO:
2024-11-20 17:18:43,099 - root:load_network:254 - INFO: Unexpected weights
2024-11-20 17:18:43,099 - root:load_network:255 - INFO: [‘network_tag.bert.embeddings.position_ids’]
2024-11-20 17:18:43,109 - root:create_network:146 - INFO: Building network …
2024-11-20 17:18:43,109 - root:init:16 - INFO: Creating new network …
2024-11-20 17:18:44,254 - root:init:22 - INFO: Number of parameters: 125332356
2024-11-20 17:18:44,254 - root:create_network:149 - INFO: Finetuning from base network multi_task_base…
2024-11-20 17:18:44,261 - root:import_from:95 - INFO: {‘new’: [‘kundennummer’, ‘auftrag’, ‘kunde’], ‘reuse’: , ‘delete’: [‘po-number’, ‘date’, ‘client-name’, ‘client-address’, ‘vendor-name’, ‘vendor-address’, ‘shipping-name’, ‘shipping-address’, ‘billing-name’, ‘billing-address’, ‘payment-terms’, ‘delivery-by-date’, ‘discount’, ‘net-amount’, ‘tax-amount’, ‘tax-rate’, ‘total-amount’, ‘line-number’, ‘description’, ‘product-code’, ‘delivery-date’, ‘unit-measure’, ‘unit-price’, ‘quantity’, ‘line-net-amount’, ‘line-tax-rate’, ‘line-tax-amount’, ‘line-amount’, ‘client-vat-no’]}
2024-11-20 17:18:44,366 - root:create_optimizer:161 - INFO: Creating AdamW optimizer
2024-11-20 17:18:44,367 - root:create_scheduler:179 - INFO: Building scheduler…
2024-11-20 17:18:44,370 - root:init:63 - INFO: Model creation done. Contains tasks [‘tag/purchase_orders’]
2024-11-20 17:18:44,370 - root:init:64 - INFO: Model allocated to device(s): cuda:0
2024-11-20 17:18:44,370 - root:init:65 - INFO: world size: 1
2024-11-20 17:18:44,379 - root:train:263 - INFO: Creating torch datasets …
2024-11-20 17:18:44,382 - root:init_for_train:689 - INFO: Task tag/purchase_orders train: True all samples 19 actual samples 19
2024-11-20 17:18:44,383 - root:init_for_train:689 - INFO: Task tag/purchase_orders train: False all samples 0 actual samples 0
2024-11-20 17:18:44,383 - root:train:294 - INFO: batch size: 4
2024-11-20 17:18:44,383 - root:train:295 - INFO: dataloader workers: 4
2024-11-20 17:18:44,383 - root:train:305 - INFO: Training for 100 epochs…
2024-11-20 17:18:44,383 - root:train:306 - INFO: Training Set Size: 19 samples
2024-11-20 17:18:44,383 - root:train:307 - INFO: Test Set Size: 0 samples
2024-11-20 17:18:44,384 - root:train:317 - INFO: Training one epoch with frozen backbone to initialize heads…
2024-11-20 17:18:46,618 - root:_log_to_console:894 - INFO: Epoch 001 [TRAIN][loss_tag/all: 0.0724][acc_tag/all: 0.0000][score/all: 0.0000][loss/all: 0.0724][lr: 0.003000]
2024-11-20 17:18:46,717 - root:_log_to_console:894 - INFO: Epoch 001 [TEST ][loss_tag/all: nan][acc_tag/all: nan][score/all: nan][loss/all: nan]
2024-11-20 17:18:46,720 - root:train:326 - INFO: Training for 99 epochs …
2024-11-20 17:18:50,603 - root:_log_to_console:894 - INFO: Epoch 002 [TRAIN][loss_tag/all: 0.0472][acc_tag/all: 0.0625][score/all: 0.0625][loss/all: 0.0472][lr: 0.000032]
2024-11-20 17:18:50,610 - root:_log_to_console:894 - INFO: Epoch 002 [TEST ][loss_tag/all: nan][acc_tag/all: nan][score/all: nan][loss/all: nan]
2024-11-20 17:18:54,469 - root:_log_to_console:894 - INFO: Epoch 003 [TRAIN][loss_tag/all: 0.0143][acc_tag/all: 0.3135][score/all: 0.3135][loss/all: 0.0143][lr: 0.000032]
2024-11-20 17:18:54,474 - root:_log_to_console:894 - INFO: Epoch 003 [TEST ][loss_tag/all: nan][acc_tag/all: nan][score/all: nan][loss/all: nan]
2024-11-20 17:18:58,330 - root:_log_to_console:894 - INFO: Epoch 004 [TRAIN][loss_tag/all: 0.0058][acc_tag/all: 0.5500][score/all: 0.5500][loss/all: 0.0058][lr: 0.000032]
2024-11-20 17:18:58,335 - root:_log_to_console:894 - INFO: Epoch 004 [TEST ][loss_tag/all: nan][acc_tag/all: nan][score/all: nan][loss/all: nan]
2024-11-20 17:19:02,220 - root:_log_to_console:894 - INFO: Epoch 005 [TRAIN][loss_tag/all: 0.0019][acc_tag/all: 0.7528][score/all: 0.7528][loss/all: 0.0019][lr: 0.000032]
2024-11-20 17:19:02,224 - root:_log_to_console:894 - INFO: Epoch 005 [TEST ][loss_tag/all: nan][acc_tag/all: nan][score/all: nan][loss/all: nan]
2024-11-20 17:19:06,111 - root:_log_to_console:894 - INFO: Epoch 006 [TRAIN][loss_tag/all: 0.0011][acc_tag/all: 0.8623][score/all: 0.8623][loss/all: 0.0011][lr: 0.000032]
2024-11-20 17:19:06,117 - root:_log_to_console:894 - INFO: Epoch 006 [TEST ][loss_tag/all: nan][acc_tag/all: nan][score/all: nan][loss/all: nan]
2024-11-20 17:19:06,117 - UiPath_core.trainer_run:write:11 - INFO: Epoch 00006: reducing learning rate of group 0 to 1.0000e-05.
2024-11-20 17:19:09,999 - root:_log_to_console:894 - INFO: Epoch 007 [TRAIN][loss_tag/all: 0.0117][acc_tag/all: 0.8024][score/all: 0.8024][loss/all: 0.0117][lr: 0.000010]
2024-11-20 17:19:10,003 - root:_log_to_console:894 - INFO: Epoch 007 [TEST ][loss_tag/all: nan][acc_tag/all: nan][score/all: nan][loss/all: nan]
2024-11-20 17:19:13,886 - root:_log_to_console:894 - INFO: Epoch 008 [TRAIN][loss_tag/all: 0.0009][acc_tag/all: 0.8734][score/all: 0.8734][loss/all: 0.0009][lr: 0.000010]
2024-11-20 17:19:13,890 - root:_log_to_console:894 - INFO: Epoch 008 [TEST ][loss_tag/all: nan][acc_tag/all: nan][score/all: nan][loss/all: nan]
2024-11-20 17:19:17,769 - root:_log_to_console:894 - INFO: Epoch 009 [TRAIN][loss_tag/all: 0.0004][acc_tag/all: 0.9367][score/all: 0.9367][loss/all: 0.0004][lr: 0.000010]
2024-11-20 17:19:17,774 - root:_log_to_console:894 - INFO: Epoch 009 [TEST ][loss_tag/all: nan][acc_tag/all: nan][score/all: nan][loss/all: nan]
2024-11-20 17:19:21,656 - root:_log_to_console:894 - INFO: Epoch 010 [TRAIN][loss_tag/all: 0.0020][acc_tag/all: 0.9374][score/all: 0.9374][loss/all: 0.0020][lr: 0.000010]
2024-11-20 17:19:21,662 - root:_log_to_console:894 - INFO: Epoch 010 [TEST ][loss_tag/all: nan][acc_tag/all: nan][score/all: nan][loss/all: nan]
2024-11-20 17:19:25,887 - root:_log_to_console:894 - INFO: Epoch 011 [TRAIN][loss_tag/all: 0.0044][acc_tag/all: 0.8388][score/all: 0.8388][loss/all: 0.0044][lr: 0.000010]
2024-11-20 17:19:25,893 - root:_log_to_console:894 - INFO: Epoch 011 [TEST ][loss_tag/all: nan][acc_tag/all: nan][score/all: nan][loss/all: nan]
2024-11-20 17:19:29,796 - root:_log_to_console:894 - INFO: Epoch 012 [TRAIN][loss_tag/all: 0.0006][acc_tag/all: 0.9194][score/all: 0.9194][loss/all: 0.0006][lr: 0.000010]
2024-11-20 17:19:29,801 - root:_log_to_console:894 - INFO: Epoch 012 [TEST ][loss_tag/all: nan][acc_tag/all: nan][score/all: nan][loss/all: nan]
2024-11-20 17:19:29,801 - UiPath_core.trainer_run:write:11 - INFO: Epoch 00012: reducing learning rate of group 0 to 3.1623e-06.
2024-11-20 17:19:29,801 - root:train:336 - INFO: Stopping training at epoch 12 after 12 epochs without improvement.
2024-11-20 17:19:29,801 - root:train:341 - INFO: Training complete. Score -1000000.0000 Epoch 0
2024-11-20 17:19:29,899 - root:train:95 - ERROR: No best model saved. Try training for more epochs or add more data to your training set.
NoneType: None
2024-11-20 17:19:29,899 - UiPath_core.training_plugin:model_run:189 - ERROR: Training failed for pipeline type: TRAIN_ONLY, error: [Errno 2] No such file or directory: ‘/workspace/model/microservice/models/multi_task_base/network.p’
2024-11-20 17:19:29,917 - UiPath_core.trainer_run:main:103 - ERROR: Training Job failed, error: [Errno 2] No such file or directory: ‘/workspace/model/microservice/models/multi_task_base/network.p’
Traceback (most recent call last):
File “/model/bin/UiPath_core/trainer_run.py”, line 98, in main
wrapper.run()
File “/workspace/model/microservice/training_wrapper.py”, line 65, in run
return self.training_plugin.model_run()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/model/bin/UiPath_core/training_plugin.py”, line 205, in model_run
raise ex
File “/model/bin/UiPath_core/training_plugin.py”, line 181, in model_run
self.run_train_only()
File “/model/bin/UiPath_core/training_plugin.py”, line 268, in run_train_only
score = self.train_model(self.local_dataset_directory)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/model/bin/UiPath_core/training_plugin.py”, line 131, in train_model
response = self.model.train(directory)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/workspace/model/microservice/train.py”, line 48, in train
tag_model_train.train(self.opt, self.dataset)
File “/workspace/model/microservice/document_understanding/semistructured/train.py”, line 96, in train
raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), os.path.join(mymodel.folder, “network.p”))
FileNotFoundError: [Errno 2] No such file or directory: ‘/workspace/model/microservice/models/multi_task_base/network.p’
2024-11-20 17:19:29,918 - UiPath_core.trainer_run:main:110 - INFO: Job was not successful.
2024-11-20 17:19:29,918 - root:_handle_upload_pending_files:55 - INFO: Uploading pending files.

I believe this is the issue.

Regards,
Azeem

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.