Failure creating training pipeline

I am currently creating a model to extract document using the document understanding package and when I create a pipeline to train the model i keep encountering data preprocess failure. I double checked on my data labeling, do let me know if I am doing anything wrong.

  1. Import the documents and uncheck “Make this an evaluation set”
  2. When selecting the dataset during pipeline creating select the exported dataset

Even after double checking on those steps I kept getting the same error:

Train only of Form24Extractor_Preview 22.6.1-preview.0 scheduled - Run 3841daf8-58ab-4a38-b696-fea7ed433b7b
Train only of Form24Extractor_Preview 22.6.1-preview.0 launched - Run 3841daf8-58ab-4a38-b696-fea7ed433b7b
Train only of Form24Extractor_Preview 22.6.1-preview.0 started - Run 3841daf8-58ab-4a38-b696-fea7ed433b7b
Train only of Form24Extractor_Preview 22.6.1-preview.0 failed - Run 3841daf8-58ab-4a38-b696-fea7ed433b7b

Error Details : Pipeline failed due to ML Package Issue

2022-11-02 02:40:24,848 - uipath_core.trainer_run:main:73 - INFO: Starting training job…
2022-11-02 02:40:28,737 - matplotlib:_get_config_or_cache_dir:484 - WARNING: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-t2u3kjwl because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2022-11-02 02:40:29,508 - matplotlib.font_manager:_load_fontmanager:1443 - INFO: generated new fontManager
2022-11-02 02:40:32,614 - uipath_core.storage.azure_storage_client:download:115 - INFO: Dataset from bucket folder training-d4872c60-f69f-4e21-8a48-55a30feba0e9/f8516607-490b-4ee1-be17-c2fcc3aa4ed6/e957ea1c-3191-4aca-b049-961dac2c5dda/export/Form24SSMExport_22-11-01T072540 with size 38 downloaded successfully
2022-11-02 02:40:32,614 - uipath_core.training_plugin:train_model:116 - INFO: Start model training…
2022-11-02 02:40:32,614 - uipath_core.training_plugin:initialize_model:110 - INFO: Start model initialization…
2022-11-02 02:40:32,615 - root:initialize_package:145 - INFO: Using package type provided by runtime argument with value: du
2022-11-02 02:40:32,615 - root:initialize_package:154 - INFO: Initializing du package options …
2022-11-02 02:40:32,618 - root:configure_options:107 - INFO: Training with random slices: False
2022-11-02 02:40:32,618 - root:configure_options:108 - INFO: Sample by size: False
2022-11-02 02:40:32,618 - root:configure_options:141 - INFO: Determining dataset language for document type du…
2022-11-02 02:40:32,647 - root:configure_options:144 - INFO: Document type du language: en
2022-11-02 02:40:32,647 - root:initialize_package:159 - INFO: System-Level Configuration:
2022-11-02 02:40:32,647 - root:initialize_package:160 - INFO: ATen/Parallel:
at::get_num_threads() : 3
at::get_num_interop_threads() : 2
OpenMP 201511 (a.k.a. OpenMP 4.5)
omp_get_max_threads() : 3
Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
mkl_get_max_threads() : 3
Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
std::thread::hardware_concurrency() : 4
Environment variables:
OMP_NUM_THREADS : 3
MKL_NUM_THREADS : [not set]
ATen parallel backend: OpenMP

2022-11-02 02:40:32,648 - root:configure_options:107 - INFO: Training with random slices: False
2022-11-02 02:40:32,648 - root:configure_options:108 - INFO: Sample by size: False
2022-11-02 02:40:32,648 - root:configure_options:144 - INFO: Document type du language: en
2022-11-02 02:40:32,648 - uipath_core.training_plugin:initialize_model:113 - INFO: Model initialized successfully
2022-11-02 02:40:32,648 - root:log_data_version_info:13 - INFO: =========Data version information=========
2022-11-02 02:40:32,664 - root:log_data_version_info:17 - WARNING: Unknown data version:
2022-11-02 02:40:32,664 - root:log_data_version_info:17 - INFO: ==========================================
2022-11-02 02:40:32,664 - root:preprocess_data:575 - INFO: Creating dataset for document type du…
2022-11-02 02:40:32,697 - root:preprocess_data:577 - INFO: Doctype du Statistics:
2022-11-02 02:40:32,697 - root:preprocess_data:580 - INFO:
Extraction fields:
tag = 5287
tag[companyname] = 28
tag[brn] = 11

Subsets:
subset[TEST] = 6

2022-11-02 02:40:32,698 - root:create_processor:43 - INFO: Loading LayoutLMV2 processor from HuggingFace …
2022-11-02 02:40:38,266 - root:preprocess_data:649 - INFO: train: (0, 16) pages
2022-11-02 02:40:38,266 - root:preprocess_data:650 - INFO: test: (0, 16) pages
2022-11-02 02:40:38,266 - root:preprocess_dataset:50 - ERROR: Dataset preprocess Failed
Traceback (most recent call last):
File “”, line 49, in preprocess_dataset
File “”, line 147, in init
File “”, line 35, in init
File “”, line 651, in preprocess_data
AssertionError: Training and / or validation set is empty, verify that training / validation split is correctly set
2022-11-02 02:40:38,269 - uipath_core.training_plugin:model_run:152 - ERROR: Training failed for pipeline type: TRAIN_ONLY, error: Dataset preprocess Failed
2022-11-02 02:40:38,274 - uipath_core.trainer_run:main:90 - ERROR: Training Job failed, error: Dataset preprocess Failed
Traceback (most recent call last):
File “”, line 49, in preprocess_dataset
File “”, line 147, in init
File “”, line 35, in init
File “”, line 651, in preprocess_data
AssertionError: Training and / or validation set is empty, verify that training / validation split is correctly set

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/model/bin/uipath_core/trainer_run.py”, line 85, in main
wrapper.run()
File “/workspace/model/microservice/training_wrapper.py”, line 64, in run
return self.training_plugin.model_run()
File “/model/bin/uipath_core/training_plugin.py”, line 153, in model_run
raise e
File “/model/bin/uipath_core/training_plugin.py”, line 145, in model_run
self.run_train_only()
File “/model/bin/uipath_core/training_plugin.py”, line 214, in run_train_only
self.train_model(self.local_dataset_directory)
File “/model/bin/uipath_core/training_plugin.py”, line 118, in train_model
self.model.train(directory)
File “/workspace/model/microservice/train.py”, line 36, in train
self.process_data()
File “/workspace/model/microservice/train.py”, line 69, in process_data
self.trainer.preprocess_dataset()
File “”, line 50, in preprocess_dataset
Exception: Dataset preprocess Failed

Is there any way to overcome this?

Hi @Dharsyana_Rao_a_l_Selladu ,

Are you using a Preview version of ML Package?

nope just document understanding package

@Dharsyana_Rao_a_l_Selladu ,

We do see that the Base Version is having Preview suffix on it, so the assumption.

Additionally could you send the Screenshot of the Pipeline Configuration ?

the configuration is as stated in the image

@Dharsyana_Rao_a_l_Selladu ,

I believe you are using the Package marked in Yellow but not the one marked in Red ?

Maybe Check whether when used the Other Package as well, if you are able to create training pipeline successfully.

Mostly, If there is a failure using the other package as well, then we might need to check what was done in Data Labelling.

Thanks so much for the help I found the issue, I executed the training pipeline with a small training size (6 documents) after increasing the size to 23 documents I was able to train it successfully.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.