Train Invoice ML Model with Validation Station data

I’m trying to retrain the Invoice out of the box ML model in my AI Center on-prem using the training data generated by the validation station in Action Center.
This is the data structure generated by the validation station:

image

The Invoice Trainer activity is automatically uploading it from Studio to a DataSet, inside of a fine-tune folder:


I'm not using the Data Manager in any step of the process since I'm trying to retrain an out of the box model and did not customize my labelling. Still, the pipeline is failling with the log bellow. What am I doing wrong?

Train only of Invoices 8.0 launched - Run 260f0253-58fc-42ca-9eda-cfdc4f787f76
Train only of Invoices 8.0 scheduled - Run 260f0253-58fc-42ca-9eda-cfdc4f787f76
Train only of Invoices 8.0 started - Run 260f0253-58fc-42ca-9eda-cfdc4f787f76
Train only of Invoices 8.0 failed - Run 260f0253-58fc-42ca-9eda-cfdc4f787f76

Error Details : Pipeline failed due to ML Package Issue

2021-12-09 16:31:01,462 - UiPath_core.trainer_run:main:66 - INFO: Starting training job…
2021-12-09 16:31:07,699 - UiPath_core.storage.local_storage_client:download:113 - INFO: Dataset from bucket folder training-7d8e37df-0ca0-4f11-bff6-37660dcfa5ee/53d39051-e17a-4b8a-bd75-a153260c534e/440fba9a-f7d1-4179-9e09-1409bd1faf85 with size 3 downloaded successfully
2021-12-09 16:31:07,700 - UiPath_core.training_plugin:train_model:109 - INFO: Start model training…
2021-12-09 16:31:07,700 - UiPath_core.training_plugin:initialize_model:103 - INFO: Start model initialization…
2021-12-09 16:31:07,702 - root:_valid_doctype_folder_structure:63 - ERROR: images/ directory does not exist / is empty for {‘name’: ‘invoices’, ‘folder’: ‘’, ‘language’: ‘en’, ‘dataset’: {‘account_name’: None, ‘folder’: ‘’, ‘path’: ‘/microservice/dataset’, ‘dataloader_workers’: 0, ‘vocabulary_padding_id’: 0, ‘vocabulary_unknown_id’: 1, ‘text_pp_remove_symbols’: False, ‘text_pp_lemmatization’: False, ‘text_pp_remove_stop_words’: False, ‘word_embedding’: ‘unknown_id’, ‘max_words’: 10000, ‘max_image_size’: [300, 300], ‘date_format_classifier_data’: [‘receipts’, ‘invoices’, ‘invoices_au’, ‘invoices_india’, ‘utility_bills’, ‘purchase_orders’, ‘invoices_japan’, ‘unknown’], ‘replace_patterns’: [‘date’, ‘number’, ‘checkbox’], ‘doctype2id’: {}, ‘clftask2id’: {}, ‘id2clftask’: {}, ‘clf_tasks_by_doctype’: defaultdict(<class ‘list’>, {})}, ‘path’: ‘/microservice/dataset/’, ‘split’: ‘/microservice/dataset/split.csv’, ‘schema’: ‘/microservice/dataset/schema.json’} dataset
2021-12-09 16:31:07,702 - UiPath_core.training_plugin:model_run:145 - ERROR: Training failed for pipeline type: TRAIN_ONLY, error: Document type invoices not valid, check that document type data is in dataset folder and follows folder structure.
2021-12-09 16:31:07,709 - UiPath_core.trainer_run:main:81 - ERROR: Training Job failed, error: Document type invoices not valid, check that document type data is in dataset folder and follows folder structure.
Traceback (most recent call last):
File “/home/aifabric/.local/lib/python3.8/site-packages/UiPath_core/trainer_run.py”, line 76, in main
wrapper.run()
File “/microservice/training_wrapper.py”, line 57, in run
return self.training_plugin.model_run()
File “/home/aifabric/.local/lib/python3.8/site-packages/UiPath_core/training_plugin.py”, line 146, in model_run
raise e
File “/home/aifabric/.local/lib/python3.8/site-packages/UiPath_core/training_plugin.py”, line 138, in model_run
self.run_train_only()
File “/home/aifabric/.local/lib/python3.8/site-packages/UiPath_core/training_plugin.py”, line 207, in run_train_only
self.train_model(self.local_dataset_directory)
File “/home/aifabric/.local/lib/python3.8/site-packages/UiPath_core/training_plugin.py”, line 111, in train_model
self.model.train(directory)
File “/home/aifabric/.local/lib/python3.8/site-packages/UiPath_core/training_plugin.py”, line 99, in model
self.initialize_model()
File “/home/aifabric/.local/lib/python3.8/site-packages/UiPath_core/training_plugin.py”, line 105, in initialize_model
self._model = train.Main()
File “/microservice/train.py”, line 24, in init
self.opt = self.get_options()
File “/microservice/train.py”, line 105, in get_options
opt = preprocess.configure_options(opt)
File “”, line 99, in configure_options
Exception: Document type invoices not valid, check that document type data is in dataset folder and follows folder structure.
2021-12-09 16:31:16,296 - UiPath_core.trainer_run:main:66 - INFO: Starting training job…
2021-12-09 16:31:16,296 - UiPath_core.trainer_run:main:66 - INFO: Starting training job…
2021-12-09 16:31:22,290 - UiPath_core.logs.upload_log_service:upload_logs_file:56 - INFO: Retry Training Triggered:
2021-12-09 16:31:22,412 - UiPath_core.storage.local_storage_client:download:113 - INFO: Dataset from bucket folder training-7d8e37df-0ca0-4f11-bff6-37660dcfa5ee/53d39051-e17a-4b8a-bd75-a153260c534e/440fba9a-f7d1-4179-9e09-1409bd1faf85 with size 3 downloaded successfully
2021-12-09 16:31:22,412 - UiPath_core.training_plugin:train_model:109 - INFO: Start model training…
2021-12-09 16:31:22,412 - UiPath_core.training_plugin:initialize_model:103 - INFO: Start model initialization…
2021-12-09 16:31:22,414 - root:_valid_doctype_folder_structure:63 - ERROR: images/ directory does not exist / is empty for {‘name’: ‘invoices’, ‘folder’: ‘’, ‘language’: ‘en’, ‘dataset’: {‘account_name’: None, ‘folder’: ‘’, ‘path’: ‘/microservice/dataset’, ‘dataloader_workers’: 0, ‘vocabulary_padding_id’: 0, ‘vocabulary_unknown_id’: 1, ‘text_pp_remove_symbols’: False, ‘text_pp_lemmatization’: False, ‘text_pp_remove_stop_words’: False, ‘word_embedding’: ‘unknown_id’, ‘max_words’: 10000, ‘max_image_size’: [300, 300], ‘date_format_classifier_data’: [‘receipts’, ‘invoices’, ‘invoices_au’, ‘invoices_india’, ‘utility_bills’, ‘purchase_orders’, ‘invoices_japan’, ‘unknown’], ‘replace_patterns’: [‘date’, ‘number’, ‘checkbox’], ‘doctype2id’: {}, ‘clftask2id’: {}, ‘id2clftask’: {}, ‘clf_tasks_by_doctype’: defaultdict(<class ‘list’>, {})}, ‘path’: ‘/microservice/dataset/’, ‘split’: ‘/microservice/dataset/split.csv’, ‘schema’: ‘/microservice/dataset/schema.json’} dataset
2021-12-09 16:31:22,414 - UiPath_core.training_plugin:model_run:145 - ERROR: Training failed for pipeline type: TRAIN_ONLY, error: Document type invoices not valid, check that document type data is in dataset folder and follows folder structure.
2021-12-09 16:31:22,416 - UiPath_core.trainer_run:main:81 - ERROR: Training Job failed, error: Document type invoices not valid, check that document type data is in dataset folder and follows folder structure.
Traceback (most recent call last):
File “/home/aifabric/.local/lib/python3.8/site-packages/UiPath_core/trainer_run.py”, line 76, in main
wrapper.run()
File “/microservice/training_wrapper.py”, line 57, in run
return self.training_plugin.model_run()
File “/home/aifabric/.local/lib/python3.8/site-packages/UiPath_core/training_plugin.py”, line 146, in model_run
raise e
File “/home/aifabric/.local/lib/python3.8/site-packages/UiPath_core/training_plugin.py”, line 138, in model_run
self.run_train_only()
File “/home/aifabric/.local/lib/python3.8/site-packages/UiPath_core/training_plugin.py”, line 207, in run_train_only
self.train_model(self.local_dataset_directory)
File “/home/aifabric/.local/lib/python3.8/site-packages/UiPath_core/training_plugin.py”, line 111, in train_model
self.model.train(directory)
File “/home/aifabric/.local/lib/python3.8/site-packages/UiPath_core/training_plugin.py”, line 99, in model
self.initialize_model()
File “/home/aifabric/.local/lib/python3.8/site-packages/UiPath_core/training_plugin.py”, line 105, in initialize_model
self._model = train.Main()
File “/microservice/train.py”, line 24, in init
self.opt = self.get_options()
File “/microservice/train.py”, line 105, in get_options
opt = preprocess.configure_options(opt)
File “”, line 99, in configure_options
Exception: Document type invoices not valid, check that document type data is in dataset folder and follows folder structure.

Hello @Lucas_Ferraz!

It seems that you have trouble getting an answer to your question in the first 24 hours.
Let us give you a few hints and helpful links.

First, make sure you browsed through our Forum FAQ Beginner’s Guide. It will teach you what should be included in your topic.

You can check out some of our resources directly, see below:

  1. Always search first. It is the best way to quickly find your answer. Check out the image icon for that.
    Clicking the options button will let you set more specific topic search filters, i.e. only the ones with a solution.

  2. Topic that contains most common solutions with example project files can be found here.

  3. Read our official documentation where you can find a lot of information and instructions about each of our products:

  4. Watch the videos on our official YouTube channel for more visual tutorials.

  5. Meet us and our users on our Community Slack and ask your question there.

Hopefully this will let you easily find the solution/information you need. Once you have it, we would be happy if you could share your findings here and mark it as a solution. This will help other users find it in the future.

Thank you for helping us build our UiPath Community!

Cheers from your friendly
Forum_Staff

Hi

I would still like to know the answer to this thread? Have you had any update?

Thanks

1 Like