Failure when trying to Create new Packages via Pipeline


I have seen the Failre log and pastng for reference.

2022-07-06 00:58:49,757 - uipath_core.trainer_run:main:73 - INFO: Starting training job…
2022-07-06 00:58:53,178 - matplotlib:_get_config_or_cache_dir:484 - WARNING: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-_2c0rldt because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2022-07-06 00:58:54,001 - matplotlib.font_manager:_load_fontmanager:1443 - INFO: generated new fontManager
2022-07-06 00:58:56,783 - uipath_core.storage.azure_storage_client:download:112 - INFO: Dataset from bucket folder training-49aa85ea-2b71-479f-aae1-db1d6c2f3371/9877f90d-d8f6-4f8b-95b6-872d6e8ef9b7/ca150876-0c82-455d-81d1-9e082800ce2b/export/invoice_healthcare_1_22-07-05T212300 with size 44 downloaded successfully
2022-07-06 00:58:56,783 - uipath_core.training_plugin:train_model:114 - INFO: Start model training…
2022-07-06 00:58:56,783 - uipath_core.training_plugin:initialize_model:108 - INFO: Start model initialization…
2022-07-06 00:58:56,785 - root:initialize_package:145 - INFO: Using package type provided by runtime argument with value: invoices
2022-07-06 00:58:56,785 - root:initialize_package:154 - INFO: Initializing invoices package options …
2022-07-06 00:58:56,787 - root:configure_options:158 - INFO: Document type invoices language: en
2022-07-06 00:58:56,787 - root:initialize_tokenizer:49 - INFO: Loading cached BERT tokenizer at /workspace/model/microservice/bert-base-multilingual-uncased_tokenizer
2022-07-06 00:58:56,872 - root:initialize_package:159 - INFO: System-Level Configuration:
2022-07-06 00:58:56,872 - root:initialize_package:160 - INFO: ATen/Parallel:
at::get_num_threads() : 3
at::get_num_interop_threads() : 2
OpenMP 201511 (a.k.a. OpenMP 4.5)
omp_get_max_threads() : 3
Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
mkl_get_max_threads() : 3
Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
std::thread::hardware_concurrency() : 4
Environment variables:
OMP_NUM_THREADS : 3
MKL_NUM_THREADS : [not set]
ATen parallel backend: OpenMP

2022-07-06 00:58:56,873 - root:configure_options:158 - INFO: Document type invoices language: en
2022-07-06 00:58:56,873 - uipath_core.training_plugin:initialize_model:111 - INFO: Model initialized successfully
2022-07-06 00:58:56,873 - root:log_data_version_info:13 - INFO: =========Data version information=========
2022-07-06 00:58:56,886 - root:log_data_version_info:17 - WARNING: Unknown data version:
2022-07-06 00:58:56,887 - root:log_data_version_info:17 - INFO: ==========================================
2022-07-06 00:58:56,888 - root:preprocess_data:603 - INFO: Creating dataset for document type invoices…
2022-07-06 00:58:57,146 - root:preprocess_data:605 - INFO: Doctype invoices Statistics:
2022-07-06 00:58:57,146 - root:preprocess_data:608 - INFO:
Extraction fields:
tag = 9129
tag[billing-name] = 50
tag[invoice-no] = 10
tag[total] = 10
tag[due-date] = 10

Subsets:
subset[TEST] = 10

2022-07-06 00:59:12,167 - root:preprocess_data:676 - INFO: train: (0, 15) pages
2022-07-06 00:59:12,167 - root:preprocess_data:677 - INFO: test: (0, 15) pages
2022-07-06 00:59:12,168 - root:preprocess_dataset:49 - ERROR: Dataset preprocess Failed
Traceback (most recent call last):
File “”, line 48, in preprocess_dataset
File “”, line 143, in init
File “”, line 31, in init
File “”, line 678, in preprocess_data
AssertionError: Training and / or validation set is empty, verify that training / validation split is correctly set
2022-07-06 00:59:12,171 - uipath_core.training_plugin:model_run:150 - ERROR: Training failed for pipeline type: TRAIN_ONLY, error: Dataset preprocess Failed
2022-07-06 00:59:12,176 - uipath_core.trainer_run:main:90 - ERROR: Training Job failed, error: Dataset preprocess Failed
Traceback (most recent call last):
File “”, line 48, in preprocess_dataset
File “”, line 143, in init
File “”, line 31, in init
File “”, line 678, in preprocess_data
AssertionError: Training and / or validation set is empty, verify that training / validation split is correctly set

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/model/bin/uipath_core/trainer_run.py”, line 85, in main
wrapper.run()
File “/workspace/model/microservice/training_wrapper.py”, line 64, in run
return self.training_plugin.model_run()
File “/model/bin/uipath_core/training_plugin.py”, line 151, in model_run
raise e
File “/model/bin/uipath_core/training_plugin.py”, line 143, in model_run
self.run_train_only()
File “/model/bin/uipath_core/training_plugin.py”, line 212, in run_train_only
self.train_model(self.local_dataset_directory)
File “/model/bin/uipath_core/training_plugin.py”, line 116, in train_model
self.model.train(directory)
File “/workspace/model/microservice/train.py”, line 36, in train
self.process_data()
File “/workspace/model/microservice/train.py”, line 69, in process_data
self.trainer.preprocess_dataset()
File “”, line 49, in preprocess_dataset
Exception: Dataset preprocess Failed
2022-07-06 00:59:47,362 - uipath_core.trainer_run:main:73 - INFO: Starting training job…
2022-07-06 00:59:50,793 - matplotlib:_get_config_or_cache_dir:484 - WARNING: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-2d6j2p4z because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2022-07-06 00:59:51,569 - matplotlib.font_manager:_load_fontmanager:1443 - INFO: generated new fontManager
2022-07-06 00:59:52,830 - uipath_core.logs.upload_log_service:upload_logs_file:87 - INFO: Retry Training Triggered:
2022-07-06 00:59:53,952 - uipath_core.storage.azure_storage_client:download:112 - INFO: Dataset from bucket folder training-49aa85ea-2b71-479f-aae1-db1d6c2f3371/9877f90d-d8f6-4f8b-95b6-872d6e8ef9b7/ca150876-0c82-455d-81d1-9e082800ce2b/export/invoice_healthcare_1_22-07-05T212300 with size 44 downloaded successfully
2022-07-06 00:59:53,953 - uipath_core.training_plugin:train_model:114 - INFO: Start model training…
2022-07-06 00:59:53,953 - uipath_core.training_plugin:initialize_model:108 - INFO: Start model initialization…
2022-07-06 00:59:53,954 - root:initialize_package:145 - INFO: Using package type provided by runtime argument with value: invoices
2022-07-06 00:59:53,954 - root:initialize_package:154 - INFO: Initializing invoices package options …
2022-07-06 00:59:53,955 - root:configure_options:158 - INFO: Document type invoices language: en
2022-07-06 00:59:53,955 - root:initialize_tokenizer:49 - INFO: Loading cached BERT tokenizer at /workspace/model/microservice/bert-base-multilingual-uncased_tokenizer
2022-07-06 00:59:54,028 - root:initialize_package:159 - INFO: System-Level Configuration:
2022-07-06 00:59:54,029 - root:initialize_package:160 - INFO: ATen/Parallel:
at::get_num_threads() : 3
at::get_num_interop_threads() : 2
OpenMP 201511 (a.k.a. OpenMP 4.5)
omp_get_max_threads() : 3
Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
mkl_get_max_threads() : 3
Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
std::thread::hardware_concurrency() : 4
Environment variables:
OMP_NUM_THREADS : 3
MKL_NUM_THREADS : [not set]
ATen parallel backend: OpenMP

2022-07-06 00:59:54,030 - root:configure_options:158 - INFO: Document type invoices language: en
2022-07-06 00:59:54,030 - uipath_core.training_plugin:initialize_model:111 - INFO: Model initialized successfully
2022-07-06 00:59:54,030 - root:log_data_version_info:13 - INFO: =========Data version information=========
2022-07-06 00:59:54,044 - root:log_data_version_info:17 - WARNING: Unknown data version:
2022-07-06 00:59:54,044 - root:log_data_version_info:17 - INFO: ==========================================
2022-07-06 00:59:54,045 - root:preprocess_data:603 - INFO: Creating dataset for document type invoices…
2022-07-06 00:59:54,277 - root:preprocess_data:605 - INFO: Doctype invoices Statistics:
2022-07-06 00:59:54,277 - root:preprocess_data:608 - INFO:
Extraction fields:
tag = 9129
tag[billing-name] = 50
tag[invoice-no] = 10
tag[total] = 10
tag[due-date] = 10

Subsets:
subset[TEST] = 10

2022-07-06 01:00:09,418 - root:preprocess_data:676 - INFO: train: (0, 15) pages
2022-07-06 01:00:09,418 - root:preprocess_data:677 - INFO: test: (0, 15) pages
2022-07-06 01:00:09,418 - root:preprocess_dataset:49 - ERROR: Dataset preprocess Failed
Traceback (most recent call last):
File “”, line 48, in preprocess_dataset
File “”, line 143, in init
File “”, line 31, in init
File “”, line 678, in preprocess_data
AssertionError: Training and / or validation set is empty, verify that training / validation split is correctly set
2022-07-06 01:00:09,422 - uipath_core.training_plugin:model_run:150 - ERROR: Training failed for pipeline type: TRAIN_ONLY, error: Dataset preprocess Failed
2022-07-06 01:00:09,427 - uipath_core.trainer_run:main:90 - ERROR: Training Job failed, error: Dataset preprocess Failed
Traceback (most recent call last):
File “”, line 48, in preprocess_dataset
File “”, line 143, in init
File “”, line 31, in init
File “”, line 678, in preprocess_data
AssertionError: Training and / or validation set is empty, verify that training / validation split is correctly set

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/model/bin/uipath_core/trainer_run.py”, line 85, in main
wrapper.run()
File “/workspace/model/microservice/training_wrapper.py”, line 64, in run
return self.training_plugin.model_run()
File “/model/bin/uipath_core/training_plugin.py”, line 151, in model_run
raise e
File “/model/bin/uipath_core/training_plugin.py”, line 143, in model_run
self.run_train_only()
File “/model/bin/uipath_core/training_plugin.py”, line 212, in run_train_only
self.train_model(self.local_dataset_directory)
File “/model/bin/uipath_core/training_plugin.py”, line 116, in train_model
self.model.train(directory)
File “/workspace/model/microservice/train.py”, line 36, in train
self.process_data()
File “/workspace/model/microservice/train.py”, line 69, in process_data
self.trainer.preprocess_dataset()
File “”, line 49, in preprocess_dataset
Exception: Dataset preprocess Failed

Please let me know how to fix it

Hello @Anandhavalli_A!

It seems that you have trouble getting an answer to your question in the first 24 hours.
Let us give you a few hints and helpful links.

First, make sure you browsed through our Forum FAQ Beginner’s Guide. It will teach you what should be included in your topic.

You can check out some of our resources directly, see below:

  1. Always search first. It is the best way to quickly find your answer. Check out the image icon for that.
    Clicking the options button will let you set more specific topic search filters, i.e. only the ones with a solution.

  2. Topic that contains most common solutions with example project files can be found here.

  3. Read our official documentation where you can find a lot of information and instructions about each of our products:

  4. Watch the videos on our official YouTube channel for more visual tutorials.

  5. Meet us and our users on our Community Slack and ask your question there.

Hopefully this will let you easily find the solution/information you need. Once you have it, we would be happy if you could share your findings here and mark it as a solution. This will help other users find it in the future.

Thank you for helping us build our UiPath Community!

Cheers from your friendly
Forum_Staff

This worked ! Thank you @SrenivasanKanna

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.