I use Machine Learning Extractor Trainer activity, I created a new dataset in AI Center, and I selected that dataset in the activity. The process runs normally, creates a task in Action Center, I validate then the dataset gets populated with the results.
There are three folders in the dataset:
After gathering some data, I tried creating a pipeline, selected the Same ML package I used, and the dataset that was created from the extractor trainer. When I try to run it it fails with this error:
2024-03-27 13:22:00,306 - UiPath_core.trainer_run:main:83 - INFO: Starting training job…
2024-03-27 13:22:02,390 - matplotlib:_get_config_or_cache_dir:531 - WARNING: Matplotlib created a temporary cache directory at /tmp/matplotlib-sl9apgxs because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2024-03-27 13:22:02,585 - matplotlib.font_manager:_load_fontmanager:1547 - INFO: generated new fontManager
2024-03-27 13:22:07,371 - UiPath_core.storage.azure_storage_client:download:118 - INFO: Dataset from bucket folder training-601ae173-d2b1-4e33-8604-bf25ec42dc37/3819bc3a-7100-4ecb-ac25-bff691aac53e/f258e63c-2120-4581-9d01-747d778f5f51 with size 84 downloaded successfully
2024-03-27 13:22:07,372 - UiPath_core.training_plugin:train_model:129 - INFO: Start model training…
2024-03-27 13:22:07,372 - UiPath_core.training_plugin:initialize_model:123 - INFO: Start model initialization…
2024-03-27 13:22:07,372 - root:initialize_package:208 - INFO: Using package type provided by runtime argument with value: invoices
2024-03-27 13:22:07,372 - root:initialize_package:217 - INFO: Initializing invoices package options …
2024-03-27 13:22:07,373 - root:_valid_doctype_folder_structure:101 - ERROR: schema.json is empty / does not exist for invoices dataset
2024-03-27 13:22:07,373 - UiPath_core.training_plugin:model_run:189 - ERROR: Training failed for pipeline type: TRAIN_ONLY, error: Document type invoices not valid, check that document type data is in dataset folder and follows folder structure.
2024-03-27 13:22:07,376 - UiPath_core.trainer_run:main:100 - ERROR: Training Job failed, error: Document type invoices not valid, check that document type data is in dataset folder and follows folder structure.
Traceback (most recent call last):
File “/model/bin/UiPath_core/trainer_run.py”, line 95, in main
wrapper.run()
File “/workspace/model/microservice/training_wrapper.py”, line 65, in run
return self.training_plugin.model_run()
File “/model/bin/UiPath_core/training_plugin.py”, line 205, in model_run
raise ex
File “/model/bin/UiPath_core/training_plugin.py”, line 181, in model_run
self.run_train_only()
File “/model/bin/UiPath_core/training_plugin.py”, line 268, in run_train_only
score = self.train_model(self.local_dataset_directory)
File “/model/bin/UiPath_core/training_plugin.py”, line 131, in train_model
response = self.model.train(directory)
File “/model/bin/UiPath_core/training_plugin.py”, line 119, in model
self.initialize_model()
File “/model/bin/UiPath_core/training_plugin.py”, line 125, in initialize_model
self._model = train.Main()
File “/workspace/model/microservice/train.py”, line 22, in init
self.opt = package_util.initialize_package(args)
File “”, line 219, in initialize_package
File “”, line 151, in get_package_opt
File “”, line 78, in configure_pipeline_options
File “”, line 161, in configure_options
Exception: Document type invoices not valid, check that document type data is in dataset folder and follows folder structure.
2024-03-27 13:22:07,376 - UiPath_core.trainer_run:main:107 - INFO: Job run stopped.
Am I missing any steps in the procedure? Isn’t the dataset in the correct structure?
