Training Pipeline failed for the TPOTAutoMLRegression ML Package

I just started working with the AI Center and am currently testing out various packages. I have a small data set of training data and testing data but the training pipeline keeps failing with the following error message:
Error Details : Pipeline failed due to ML Package Issue

call fit() first.

Does anyone know what is causing this error?
Attached here are my training data and testing data:
Training Data:
image

Testing Data:
image

The full Error:
Train only of HomePricesPrediction 1.0 scheduled - Run 619cbf5c-b7c7-408a-b594-ebe6e6b93b87
Train only of HomePricesPrediction 1.0 launched - Run 619cbf5c-b7c7-408a-b594-ebe6e6b93b87
Train only of HomePricesPrediction 1.0 started - Run 619cbf5c-b7c7-408a-b594-ebe6e6b93b87
Train only of HomePricesPrediction 1.0 failed - Run 619cbf5c-b7c7-408a-b594-ebe6e6b93b87

Error Details : Pipeline failed due to ML Package Issue

call fit() first.
joblib.externals.loky.process_executor._RemoteTraceback:
“”"
Traceback (most recent call last):
File “/home/aicenter/.local/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py”, line 418, in _process_worker
r = call_item()
File “/home/aicenter/.local/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py”, line 272, in call
return self.fn(*self.args, **self.kwargs)
File “/home/aicenter/.local/lib/python3.6/site-packages/joblib/_parallel_backends.py”, line 608, in call
return self.func(*args, **kwargs)
File “/home/aicenter/.local/lib/python3.6/site-packages/joblib/parallel.py”, line 256, in call
for func, args, kwargs in self.items]
File “/home/aicenter/.local/lib/python3.6/site-packages/joblib/parallel.py”, line 256, in
for func, args, kwargs in self.items]
File “/home/aicenter/.local/lib/python3.6/site-packages/stopit/utils.py”, line 145, in wrapper
result = func(*args, **kwargs)
File “/home/aicenter/.local/lib/python3.6/site-packages/tpot/gp_deap.py”, line 417, in _wrapped_cross_val_score
cv_iter = list(cv.split(features, target, groups))
File “/home/aicenter/.local/lib/python3.6/site-packages/sklearn/model_selection/_split.py”, line 333, in split
.format(self.n_splits, n_samples))
ValueError: Cannot have number of splits n_splits=5 greater than the number of samples: n_samples=4.
“”"

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/home/aicenter/.local/lib/python3.6/site-packages/tpot/base.py”, line 711, in fit
per_generation_function=self._check_periodic_pipeline
File “/home/aicenter/.local/lib/python3.6/site-packages/tpot/gp_deap.py”, line 227, in eaMuPlusLambda
population[:] = toolbox.evaluate(population)
File “/home/aicenter/.local/lib/python3.6/site-packages/tpot/base.py”, line 1321, in _evaluate_individuals
for sklearn_pipeline in sklearn_pipeline_list[chunk_idx:chunk_idx + chunk_size])
File “/home/aicenter/.local/lib/python3.6/site-packages/joblib/parallel.py”, line 1017, in call
self.retrieve()
File “/home/aicenter/.local/lib/python3.6/site-packages/joblib/parallel.py”, line 909, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File “/home/aicenter/.local/lib/python3.6/site-packages/joblib/_parallel_backends.py”, line 562, in wrap_future_result
return future.result(timeout=timeout)
File “/usr/local/lib/python3.6/concurrent/futures/_base.py”, line 432, in result
return self.__get_result()
File “/usr/local/lib/python3.6/concurrent/futures/_base.py”, line 384, in __get_result
raise self._exception
ValueError: Cannot have number of splits n_splits=5 greater than the number of samples: n_samples=4.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/aicenter/.local/lib/python3.6/site-packages/uipath_core/trainer_run.py”, line 85, in main
wrapper.run()
File “/microservice/training_wrapper.py”, line 57, in run
return self.training_plugin.model_run()
File “/home/aicenter/.local/lib/python3.6/site-packages/uipath_core/training_plugin.py”, line 147, in model_run
raise e
File “/home/aicenter/.local/lib/python3.6/site-packages/uipath_core/training_plugin.py”, line 139, in model_run
self.run_train_only()
File “/home/aicenter/.local/lib/python3.6/site-packages/uipath_core/training_plugin.py”, line 208, in run_train_only
self.train_model(self.local_dataset_directory)
File “/home/aicenter/.local/lib/python3.6/site-packages/uipath_core/training_plugin.py”, line 112, in train_model
self.model.train(directory)
File “/microservice/train.py”, line 39, in train
self.model = self.build_model(X, y, self.artifacts_directory)
File “/microservice/train.py”, line 58, in build_model
pipeline_optimizer.fit(X, y)
File “/home/aicenter/.local/lib/python3.6/site-packages/tpot/base.py”, line 742, in fit
raise e
File “/home/aicenter/.local/lib/python3.6/site-packages/tpot/base.py”, line 733, in fit
self._update_top_pipeline()
File “/home/aicenter/.local/lib/python3.6/site-packages/tpot/base.py”, line 811, in _update_top_pipeline
raise RuntimeError(‘A pipeline has not yet been optimized. Please call fit() first.’)
RuntimeError: A pipeline has not yet been optimized. Please call fit() first.
2022-04-07 04:40:15,881 - uipath_core.trainer_run:main:73 - INFO: Starting training job…
2022-04-07 04:40:16,842 - uipath_core.logs.upload_log_service:upload_logs_file:87 - INFO: Retry Training Triggered:
2022-04-07 04:40:16,853 - uipath_core.storage.azure_storage_client:download:106 - INFO: Dataset from bucket folder training-1fc90d1a-b983-45f7-a8ac-f316937245f5/94276d17-5ee6-4eec-a01a-1954d4343328/229c2d02-aae5-4bb5-ab52-b47d4c8af620 with size 1 downloaded successfully
2022-04-07 04:40:16,853 - uipath_core.training_plugin:train_model:110 - INFO: Start model training…
2022-04-07 04:40:16,854 - uipath_core.training_plugin:initialize_model:104 - INFO: Start model initialization…
2022-04-07 04:40:16,854 - uipath_core.training_plugin:initialize_model:107 - INFO: Model initialized successfully
2022-04-07 04:40:19,418 - uipath_core.training_plugin:model_run:146 - ERROR: Training failed for pipeline type: TRAIN_ONLY, error: A pipeline has not yet been optimized. Please call fit() first.
2022-04-07 04:40:19,419 - uipath_core.trainer_run:main:90 - ERROR: Training Job failed, error: A pipeline has not yet been optimized. Please call fit() first.
joblib.externals.loky.process_executor._RemoteTraceback:
“”"
Traceback (most recent call last):
File “/home/aicenter/.local/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py”, line 418, in _process_worker
r = call_item()
File “/home/aicenter/.local/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py”, line 272, in call
return self.fn(*self.args, **self.kwargs)
File “/home/aicenter/.local/lib/python3.6/site-packages/joblib/_parallel_backends.py”, line 608, in call
return self.func(*args, **kwargs)
File “/home/aicenter/.local/lib/python3.6/site-packages/joblib/parallel.py”, line 256, in call
for func, args, kwargs in self.items]
File “/home/aicenter/.local/lib/python3.6/site-packages/joblib/parallel.py”, line 256, in
for func, args, kwargs in self.items]
File “/home/aicenter/.local/lib/python3.6/site-packages/stopit/utils.py”, line 145, in wrapper
result = func(*args, **kwargs)
File “/home/aicenter/.local/lib/python3.6/site-packages/tpot/gp_deap.py”, line 417, in _wrapped_cross_val_score
cv_iter = list(cv.split(features, target, groups))
File “/home/aicenter/.local/lib/python3.6/site-packages/sklearn/model_selection/_split.py”, line 333, in split
.format(self.n_splits, n_samples))
ValueError: Cannot have number of splits n_splits=5 greater than the number of samples: n_samples=4.
“”"

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/home/aicenter/.local/lib/python3.6/site-packages/tpot/base.py”, line 711, in fit
per_generation_function=self._check_periodic_pipeline
File “/home/aicenter/.local/lib/python3.6/site-packages/tpot/gp_deap.py”, line 227, in eaMuPlusLambda
population[:] = toolbox.evaluate(population)
File “/home/aicenter/.local/lib/python3.6/site-packages/tpot/base.py”, line 1321, in _evaluate_individuals
for sklearn_pipeline in sklearn_pipeline_list[chunk_idx:chunk_idx + chunk_size])
File “/home/aicenter/.local/lib/python3.6/site-packages/joblib/parallel.py”, line 1017, in call
self.retrieve()
File “/home/aicenter/.local/lib/python3.6/site-packages/joblib/parallel.py”, line 909, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File “/home/aicenter/.local/lib/python3.6/site-packages/joblib/_parallel_backends.py”, line 562, in wrap_future_result
return future.result(timeout=timeout)
File “/usr/local/lib/python3.6/concurrent/futures/_base.py”, line 432, in result
return self.__get_result()
File “/usr/local/lib/python3.6/concurrent/futures/_base.py”, line 384, in __get_result
raise self._exception
ValueError: Cannot have number of splits n_splits=5 greater than the number of samples: n_samples=4.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/aicenter/.local/lib/python3.6/site-packages/uipath_core/trainer_run.py”, line 85, in main
wrapper.run()
File “/microservice/training_wrapper.py”, line 57, in run
return self.training_plugin.model_run()
File “/home/aicenter/.local/lib/python3.6/site-packages/uipath_core/training_plugin.py”, line 147, in model_run
raise e
File “/home/aicenter/.local/lib/python3.6/site-packages/uipath_core/training_plugin.py”, line 139, in model_run
self.run_train_only()
File “/home/aicenter/.local/lib/python3.6/site-packages/uipath_core/training_plugin.py”, line 208, in run_train_only
self.train_model(self.local_dataset_directory)
File “/home/aicenter/.local/lib/python3.6/site-packages/uipath_core/training_plugin.py”, line 112, in train_model
self.model.train(directory)
File “/microservice/train.py”, line 39, in train
self.model = self.build_model(X, y, self.artifacts_directory)
File “/microservice/train.py”, line 58, in build_model
pipeline_optimizer.fit(X, y)
File “/home/aicenter/.local/lib/python3.6/site-packages/tpot/base.py”, line 742, in fit
raise e
File “/home/aicenter/.local/lib/python3.6/site-packages/tpot/base.py”, line 733, in fit
self._update_top_pipeline()
File “/home/aicenter/.local/lib/python3.6/site-packages/tpot/base.py”, line 811, in _update_top_pipeline
raise RuntimeError(‘A pipeline has not yet been optimized. Please call fit() first.’)
RuntimeError: A pipeline has not yet been optimized. Please call fit() first.

Hi @pmilanraajp

Welcome to Community,

Please help to know if the Status of the ML Package is “Deployed” or “UnDeployed”?

If “UnDeployed” , please create an ML package and ML Skill Together once the ML Skill is deployed the ML Package will Automatically be Deployed

Thanks.


It says its deployed but the status is failed.

Hi @pmilanraajp

The ML Skill has been failed to Create , can you please create an ML Skill along with the ML package and the status of the ML package should be “Deployed” and ML Skill as “Available”.

Once you have these status you can retain the Model.

Hope this works,

Thanks.


My ML Skill keeps failing to be deployed, my ML Packages status is still Undeployed. Do i have to wait for the ML Package to be deployed first before creating the skill?

Try Creating the ML Skill on the ML Package the is created now and check.

Thanks.

These are the steps that i am currently following:

  1. Create Project (Name: HomePricesPrediction)
  2. Go to ML Packages Tab - Out of the box Packages - Tabular Data - TPOTAutoMLRegression
  3. Name Package (Name: HomePricesPredictionPackage) - Status = Undeployed
  4. Go to ML Skills - Create New ML Skill (Name: HomePricesPredictionSkill, Package: HomePricesPredictionPackage, Major Version: 1, Minor Version: 0)

At this point after some time the Status of the Skill should change to Deployed right?

Th status of the ML Skill should change to “Available”

But the ML Skill keeps failing


ML Logs:

Do i have to upload training data first?

@pmilanraajp ,

Basically it depends on the ML Packages being used.

For some of the ML Packages, we would require to have Datasets already available, we would require to create the Training Pipeline for it to be Success and then Upload the ML Skill.

For Instance, if you could try the above steps with the Invoices Package or Remittance Advices Package, it should be available as an ML Package with Status as available without any Pipeline being created.

The reason may be due to the fact that these Packages are already trained and have a Predefined Schema.

So before creating the skill I should train upload training data and create a training pipeline right?
That’s what i tried in the beginning, I uploaded my training data and created a training pipeline.


However it keeps failing:

With this error message:
Error.txt (10.9 KB)

@pmilanraajp , Have you Specified the target_column in the Environment Variable when Creating Training Pipeline ?

Yes I have done so.

@pmilanraajp Maybe the Training is failing due to very less data.

Try to get more data, then Try Creating only a Training Pipeline, Not Full Pipeline, also no need of Testing/Evaluation Pipeline.

Also let us know How many data you have currently.

Hi, sorry for the late reply. The issue to the problem that i was facing was with my training data. There is a minimum requirement on the number of training data and data points that is needed to train the ML Package.

This part was shared to me by UiPath Support:

Please be noted that for successfully running a Training pipeline, it is strongly recommend at least 25 documents and at least 10 samples from each labeled field in your dataset . Otherwise, the pipeline throws the following error: Dataset Creation Failed .

Kindly refer to the below link to get a detailed information about training a pipeline and high performing models:

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.