Importance of fine tune folder

Can someone please tell me the Importance of fine tune folder of dataset in Ai Center.

Regards
Anusha

Hello Anusha,

UiPath allows the capability of fine tuning an ML model and files that are validated in action center could be used as fine tuning datasets.

The reason for fine-tuning or continuous reiteration of an ML model occurs is to improve accuracy over time. If many documents fall into action center due to low confidence or missing values, these would be indicators that the ML model is not picking up something and needs more training. This is why we use the documents in action center into the fine tuning loop.

Hope this answers your question. you can go to the following resource pages to understand more.

Thankyou for your reply.

While training pipeline for the second time after validation station , do we need to select the folder which contains export and fine tune folder.

But when i select the folder that contains both iam getting an error

2022-10-27 17:35:20,185 - uipath_core.trainer_run:main:73 - INFO: Starting training job…
2022-10-27 17:35:22,953 - matplotlib:_get_config_or_cache_dir:500 - WARNING: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-x49w1_eb because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2022-10-27 17:35:23,671 - matplotlib.font_manager:_load_fontmanager:1624 - INFO: generated new fontManager
2022-10-27 17:35:25,866 - uipath_core.storage.azure_storage_client:download:115 - INFO: Dataset from bucket folder training-90fc3aab-7aa3-4585-bd2b-95aadb4b163f/a83e136c-9494-44dd-95dc-dbf6a0301771/64294cbe-eb99-49ba-bafe-5ac1d3ea4955 with size 79 downloaded successfully
2022-10-27 17:35:25,866 - uipath_core.training_plugin:train_model:120 - INFO: Start model training…
2022-10-27 17:35:25,866 - uipath_core.training_plugin:initialize_model:114 - INFO: Start model initialization…
2022-10-27 17:35:25,867 - root:initialize_package:146 - INFO: Using package type provided by runtime argument with value: purchase_orders
2022-10-27 17:35:25,867 - root:initialize_package:155 - INFO: Initializing purchase_orders package options …
2022-10-27 17:35:25,868 - root:_valid_doctype_folder_structure:81 - ERROR: schema.json is empty / does not exist for purchase_orders dataset
2022-10-27 17:35:25,868 - uipath_core.training_plugin:model_run:175 - ERROR: Training failed for pipeline type: TRAIN_ONLY, error: Document type purchase_orders not valid, check that document type data is in dataset folder and follows folder structure.
2022-10-27 17:35:25,872 - uipath_core.trainer_run:main:90 - ERROR: Training Job failed, error: Document type purchase_orders not valid, check that document type data is in dataset folder and follows folder structure.
Traceback (most recent call last):
File “/model/bin/uipath_core/trainer_run.py”, line 85, in main
wrapper.run()
File “/workspace/model/microservice/training_wrapper.py”, line 64, in run
return self.training_plugin.model_run()
File “/model/bin/uipath_core/training_plugin.py”, line 191, in model_run
raise ex
File “/model/bin/uipath_core/training_plugin.py”, line 167, in model_run
self.run_train_only()
File “/model/bin/uipath_core/training_plugin.py”, line 251, in run_train_only
self.train_model(self.local_dataset_directory)
File “/model/bin/uipath_core/training_plugin.py”, line 122, in train_model
response = self.model.train(directory)
File “/model/bin/uipath_core/training_plugin.py”, line 110, in model
self.initialize_model()
File “/model/bin/uipath_core/training_plugin.py”, line 116, in initialize_model
self._model = train.Main()
File “/workspace/model/microservice/train.py”, line 21, in init
self.opt = package_util.initialize_package(args)
File “”, line 157, in initialize_package
File “”, line 117, in get_package_opt
File “”, line 66, in configure_training_options
File “”, line 120, in configure_options
Exception: Document type purchase_orders not valid, check that document type data is in dataset folder and follows folder structure.
2022-10-27 17:36:09,400 - uipath_core.trainer_run:main:73 - INFO: Starting training job…
2022-10-27 17:36:12,162 - matplotlib:_get_config_or_cache_dir:500 - WARNING: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-clh4whfk because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2022-10-27 17:36:12,862 - matplotlib.font_manager:_load_fontmanager:1624 - INFO: generated new fontManager
2022-10-27 17:36:13,670 - uipath_core.logs.upload_log_service:upload_logs_file:87 - INFO: Retry Training Triggered:
2022-10-27 17:36:14,953 - uipath_core.storage.azure_storage_client:download:115 - INFO: Dataset from bucket folder training-90fc3aab-7aa3-4585-bd2b-95aadb4b163f/a83e136c-9494-44dd-95dc-dbf6a0301771/64294cbe-eb99-49ba-bafe-5ac1d3ea4955 with size 79 downloaded successfully
2022-10-27 17:36:14,954 - uipath_core.training_plugin:train_model:120 - INFO: Start model training…
2022-10-27 17:36:14,954 - uipath_core.training_plugin:initialize_model:114 - INFO: Start model initialization…
2022-10-27 17:36:14,954 - root:initialize_package:146 - INFO: Using package type provided by runtime argument with value: purchase_orders
2022-10-27 17:36:14,954 - root:initialize_package:155 - INFO: Initializing purchase_orders package options …
2022-10-27 17:36:14,956 - root:_valid_doctype_folder_structure:81 - ERROR: schema.json is empty / does not exist for purchase_orders dataset
2022-10-27 17:36:14,956 - uipath_core.training_plugin:model_run:175 - ERROR: Training failed for pipeline type: TRAIN_ONLY, error: Document type purchase_orders not valid, check that document type data is in dataset folder and follows folder structure.
2022-10-27 17:36:14,959 - uipath_core.trainer_run:main:90 - ERROR: Training Job failed, error: Document type purchase_orders not valid, check that document type data is in dataset folder and follows folder structure.
Traceback (most recent call last):
File “/model/bin/uipath_core/trainer_run.py”, line 85, in main
wrapper.run()
File “/workspace/model/microservice/training_wrapper.py”, line 64, in run
return self.training_plugin.model_run()
File “/model/bin/uipath_core/training_plugin.py”, line 191, in model_run
raise ex
File “/model/bin/uipath_core/training_plugin.py”, line 167, in model_run
self.run_train_only()
File “/model/bin/uipath_core/training_plugin.py”, line 251, in run_train_only
self.train_model(self.local_dataset_directory)
File “/model/bin/uipath_core/training_plugin.py”, line 122, in train_model
response = self.model.train(directory)
File “/model/bin/uipath_core/training_plugin.py”, line 110, in model
self.initialize_model()
File “/model/bin/uipath_core/training_plugin.py”, line 116, in initialize_model
self._model = train.Main()
File “/workspace/model/microservice/train.py”, line 21, in init
self.opt = package_util.initialize_package(args)
File “”, line 157, in initialize_package
File “”, line 117, in get_package_opt
File “”, line 66, in configure_training_options
File “”, line 120, in configure_options
Exception: Document type purchase_orders not valid, check that document type data is in dataset folder and follows folder structure.

Hi you shouldnt just click the export, you should select the folder you exported from AI center. Make sure your dataset that you select has these sub folders inside it.

your error says Exception: Document type purchase_orders not valid, check that document type data is in dataset folder and follows folder structure. so Im guessing it has to do with the dataset.

  1. Check the folder structure as shown above.
  2. Make sure you’re training the cumulative labelled dataset that you exported most recently that includes your fine tune documents.

Thankyou for your reply.

I have selected the folder that contains both export and finetune while creating pipeline

But iam getting below error

Exception: Document type purchase_orders not valid, check that document type data is in dataset folder and follows folder structure.
Traceback (most recent call last):

Hi @anusha2 ,
Is the Document Manager being used or have you Labelled the Datasets in Document manager ?

We do have a clearance that the Export feature in Document/Data Manager is to be Scheduled for the Data from Fine tune Folder and the Previously exported data in the Document Manager to be combined into one dataset or a larger dataset and then the combined data will be placed in the export folder.
For more details Check the below Documentation where Data Manager - Scheduled Export description is available :

Wrapping it up, you could Check by performing the following :

  1. When creating a Pipeline Select the export directory but also configure the Auto-Retraining to true as mentioned in the above document and check whether you are getting the same error.

But do note that since the Document Manager Export was not done, the new data might not be used for Training.

Let us know the results.

Can you please explain the selection of fine tune folder without scheduling

1 Like