Pipeline failed -

Hi Guys,

My pipeline as failed however I am unsure why,

this is the log:

2023-05-03 14:21:53,042 - uipath_core.trainer_run:main:74 - INFO: Starting training job…
2023-05-03 14:22:04,663 - matplotlib:_get_config_or_cache_dir:526 - WARNING: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-naa7jtbd because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2023-05-03 14:22:04,961 - matplotlib.font_manager:_load_fontmanager:1544 - INFO: generated new fontManager
2023-05-03 14:22:08,208 - uipath_core.storage.azure_storage_client:download:118 - INFO: Dataset from bucket folder training-5bc99bb3-2ab4-4254-9af4-ae0e9dc747e2/de458701-0062-47ab-86c9-47a7159b48f2/f6a4092f-e811-487a-bc7a-aede78d804d5 with size 17 downloaded successfully
2023-05-03 14:22:08,209 - uipath_core.training_plugin:train_model:129 - INFO: Start model training…
2023-05-03 14:22:08,210 - uipath_core.training_plugin:initialize_model:123 - INFO: Start model initialization…
2023-05-03 14:22:08,210 - root:initialize_package:195 - INFO: Using package type provided by runtime argument with value: du
2023-05-03 14:22:08,211 - root:initialize_package:204 - INFO: Initializing du package options …
2023-05-03 14:22:08,212 - root:_valid_doctype_folder_structure:92 - ERROR: schema.json is empty / does not exist for du dataset
2023-05-03 14:22:08,212 - uipath_core.training_plugin:model_run:189 - ERROR: Training failed for pipeline type: TRAIN_ONLY, error: Document type du not valid, check that document type data is in dataset folder and follows folder structure.
2023-05-03 14:22:08,214 - uipath_core.trainer_run:main:91 - ERROR: Training Job failed, error: Document type du not valid, check that document type data is in dataset folder and follows folder structure.
Traceback (most recent call last):
File “/model/bin/uipath_core/trainer_run.py”, line 86, in main
wrapper.run()
File “/workspace/model/microservice/training_wrapper.py”, line 64, in run
return self.training_plugin.model_run()
File “/model/bin/uipath_core/training_plugin.py”, line 205, in model_run
raise ex
File “/model/bin/uipath_core/training_plugin.py”, line 181, in model_run
self.run_train_only()
File “/model/bin/uipath_core/training_plugin.py”, line 268, in run_train_only
score = self.train_model(self.local_dataset_directory)
File “/model/bin/uipath_core/training_plugin.py”, line 131, in train_model
response = self.model.train(directory)
File “/model/bin/uipath_core/training_plugin.py”, line 119, in model
self.initialize_model()
File “/model/bin/uipath_core/training_plugin.py”, line 125, in initialize_model
self._model = train.Main()
File “/workspace/model/microservice/train.py”, line 21, in init
self.opt = package_util.initialize_package(args)
File “”, line 206, in initialize_package
File “”, line 144, in get_package_opt
File “”, line 78, in configure_pipeline_options
File “”, line 139, in configure_options
Exception: Document type du not valid, check that document type data is in dataset folder and follows folder structure.
2023-05-03 14:22:08,215 - uipath_core.trainer_run:main:98 - INFO: Job run stopped.
2023-05-03 14:22:53,425 - uipath_core.trainer_run:main:74 - INFO: Starting training job…
2023-05-03 14:22:59,127 - matplotlib:_get_config_or_cache_dir:526 - WARNING: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-840r57n1 because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2023-05-03 14:22:59,421 - matplotlib.font_manager:_load_fontmanager:1544 - INFO: generated new fontManager
2023-05-03 14:23:02,589 - uipath_core.logs.upload_log_service:upload_logs_file:92 - INFO: Retry Training Triggered:
2023-05-03 14:23:02,882 - uipath_core.storage.azure_storage_client:download:118 - INFO: Dataset from bucket folder training-5bc99bb3-2ab4-4254-9af4-ae0e9dc747e2/de458701-0062-47ab-86c9-47a7159b48f2/f6a4092f-e811-487a-bc7a-aede78d804d5 with size 17 downloaded successfully
2023-05-03 14:23:02,882 - uipath_core.training_plugin:train_model:129 - INFO: Start model training…
2023-05-03 14:23:02,882 - uipath_core.training_plugin:initialize_model:123 - INFO: Start model initialization…
2023-05-03 14:23:02,883 - root:initialize_package:195 - INFO: Using package type provided by runtime argument with value: du
2023-05-03 14:23:02,883 - root:initialize_package:204 - INFO: Initializing du package options …
2023-05-03 14:23:02,885 - root:_valid_doctype_folder_structure:92 - ERROR: schema.json is empty / does not exist for du dataset
2023-05-03 14:23:02,885 - uipath_core.training_plugin:model_run:189 - ERROR: Training failed for pipeline type: TRAIN_ONLY, error: Document type du not valid, check that document type data is in dataset folder and follows folder structure.
2023-05-03 14:23:02,887 - uipath_core.trainer_run:main:91 - ERROR: Training Job failed, error: Document type du not valid, check that document type data is in dataset folder and follows folder structure.
Traceback (most recent call last):
File “/model/bin/uipath_core/trainer_run.py”, line 86, in main
wrapper.run()
File “/workspace/model/microservice/training_wrapper.py”, line 64, in run
return self.training_plugin.model_run()
File “/model/bin/uipath_core/training_plugin.py”, line 205, in model_run
raise ex
File “/model/bin/uipath_core/training_plugin.py”, line 181, in model_run
self.run_train_only()
File “/model/bin/uipath_core/training_plugin.py”, line 268, in run_train_only
score = self.train_model(self.local_dataset_directory)
File “/model/bin/uipath_core/training_plugin.py”, line 131, in train_model
response = self.model.train(directory)
File “/model/bin/uipath_core/training_plugin.py”, line 119, in model
self.initialize_model()
File “/model/bin/uipath_core/training_plugin.py”, line 125, in initialize_model
self._model = train.Main()
File “/workspace/model/microservice/train.py”, line 21, in init
self.opt = package_util.initialize_package(args)
File “”, line 206, in initialize_package
File “”, line 144, in get_package_opt
File “”, line 78, in configure_pipeline_options
File “”, line 139, in configure_options
Exception: Document type du not valid, check that document type data is in dataset folder and follows folder structure.
2023-05-03 14:23:02,887 - uipath_core.trainer_run:main:98 - INFO: Job run stopped.

Hi @Aki1111 ,

Could you let us know if the datasets are available for the Training ?

Also, Let us know if the datasets are available, What was the Configuration done for the Pipeline Creation ?

1 Like

Hi @Aki1111

This happens when you don’t indicate the correct directory in the dataset at the creation of the pipeline.

As the ML log suggests data set folder structure has some issues, you might have to re run the pipeline with proper dataset directory indication.

Please indicate the dataset directory like this & try it again:

Your DataSet > Exports > Then select the exported directory from labeling.

Hope this helps,
Best Regards.

1 Like

Hi,

So I created the data labeling session and imported the data for data labeling however I did not export it because of this,

image

@Aki1111

In order to label & export the data from the data labeling session to dataset, you need to label atleast 10 pages of data (10 labels for the field you have).

It l is recommended to import more data in here. Then you will be able to label & export them.

Best Regards.

@Aki1111 ,

As mentioned in the Error, we do have to provide enough samples where a field value is present. Each of the fields should be labelled in atleast 10-15 samples of documents. Also, I believe the query was already discussed and a detailed explanation along with the documentation was already provided earlier in one of your topics :

1 Like

Hi okay thanks for that,

can I click on all labelled? Because I don’t have alot of data.

@Aki1111

You can select current search results & export to AI Center. After the export is completed, the same will be reflected in your dataset.

Best Regards.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.