Creating a Training Run pipeline for Purchase Orders and failing

Hi,

I am getting the below error when creating a training pipeline. I am using out-of-the-box purchase orders model. Earlier I got an error saying schema.json file not found. I found out that it can happen if I zip my project and unzip in windows. This issue got resolved after I did another project from the scratch. Then I got an error saying latest not found. So I created a latest folder and then that was ok. Now I am getting the below error.

2020-09-12 09:26:14,542 - main:main:69 - INFO: Starting training job…
2020-09-12 09:26:14,543 - wrapper.utils:_retries:20 - INFO: Total time taken to execute func: list_blobs : 0.00010085105895996094
2020-09-12 09:26:14,548 - wrapper.utils:_retries:20 - INFO: Total time taken to execute func: list_blobs : 4.76837158203125e-05
2020-09-12 09:26:14,756 - wrapper.utils:_retries:20 - INFO: Total time taken to execute func: upload : 0.11476397514343262
2020-09-12 09:26:14,943 - wrapper.gcp_storage_client:download:86 - INFO: Dataset from bucket folder training-6be660f5-d47e-4657-93ba-089e51dd374d/0aed828e-48c5-467b-a1c8-9507efa4b146/27a1f771-d311-4218-9cb6-52ebf5efb554 with size 5 downloaded successfully
2020-09-12 09:26:14,943 - wrapper.utils:_retries:20 - INFO: Total time taken to execute func: download : 0.39600515365600586
2020-09-12 09:26:14,944 - wrapper.training_wrapper:train_model:94 - INFO: Start model training…
2020-09-12 09:26:14,944 - wrapper.training_wrapper:initialize_model:88 - INFO: Start model initialization…
2020-09-12 09:26:14,945 - wrapper.training_wrapper:initialize_model:91 - INFO: Model initialized successfully
2020-09-12 09:26:14,946 - root:preprocess_data:318 - INFO: Create Dataset
2020-09-12 09:26:14,946 - root:_train:115 - ERROR: Dataset Creation Failed
Traceback (most recent call last):
File “/training/extraction/model_tag/train.py”, line 113, in _train
df_train, df_test = preprocess.preprocess_data(opt)
File “/training/extraction/model_tag/preprocess.py”, line 319, in preprocess_data
errors, report = download_data.generate_tag_report(opt[“dataset”][“path”])
File “/training/extraction/webapp_tagger/download_data.py”, line 365, in generate_tag_report
for line in doc[“words”]:
KeyError: ‘words’
2020-09-12 09:26:14,947 - wrapper.training_wrapper:run:143 - ERROR: Training failed for pipeline type: TRAIN_ONLY, error: Dataset Creation Failed
2020-09-12 09:26:14,951 - main:main:78 - ERROR: Training Job failed, error: Dataset Creation Failed
Traceback (most recent call last):
File “/training/extraction/model_tag/train.py”, line 113, in _train
df_train, df_test = preprocess.preprocess_data(opt)
File “/training/extraction/model_tag/preprocess.py”, line 319, in preprocess_data
errors, report = download_data.generate_tag_report(opt[“dataset”][“path”])
File “/training/extraction/webapp_tagger/download_data.py”, line 365, in generate_tag_report
for line in doc[“words”]:
KeyError: ‘words’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “trainer_run.py”, line 73, in main
wrapper.run()
File “/training/wrapper/training_wrapper.py”, line 144, in run
raise e
File “/training/wrapper/training_wrapper.py”, line 136, in run
self.run_train_only()
File “/training/wrapper/training_wrapper.py”, line 205, in run_train_only
self.train_model(self.local_dataset_directory)
File “/training/wrapper/training_wrapper.py”, line 96, in train_model
self.model.train(directory)
File “/training/train.py”, line 24, in train
train_local._train(self.opt, self.df_train, self.df_test)
File “/training/extraction/model_tag/train.py”, line 117, in _train
raise Exception(“Dataset Creation Failed”)
Exception: Dataset Creation Failed
2020-09-12 09:26:24,692 - main:main:69 - INFO: Starting training job…
2020-09-12 09:26:24,692 - wrapper.utils:_retries:20 - INFO: Total time taken to execute func: list_blobs : 8.630752563476562e-05
2020-09-12 09:26:24,694 - wrapper.utils:_retries:20 - INFO: Total time taken to execute func: list_blobs : 4.00543212890625e-05
2020-09-12 09:26:24,798 - wrapper.upload_log_service:upload_logs_file:52 - INFO: Retry Training Triggered:
2020-09-12 09:26:24,948 - wrapper.gcp_storage_client:download:86 - INFO: Dataset from bucket folder training-6be660f5-d47e-4657-93ba-089e51dd374d/0aed828e-48c5-467b-a1c8-9507efa4b146/27a1f771-d311-4218-9cb6-52ebf5efb554 with size 5 downloaded successfully
2020-09-12 09:26:24,948 - wrapper.utils:_retries:20 - INFO: Total time taken to execute func: download : 0.2542102336883545
2020-09-12 09:26:24,948 - wrapper.training_wrapper:train_model:94 - INFO: Start model training…
2020-09-12 09:26:24,949 - wrapper.training_wrapper:initialize_model:88 - INFO: Start model initialization…
2020-09-12 09:26:24,950 - wrapper.training_wrapper:initialize_model:91 - INFO: Model initialized successfully
2020-09-12 09:26:24,951 - root:preprocess_data:318 - INFO: Create Dataset
2020-09-12 09:26:24,951 - root:_train:115 - ERROR: Dataset Creation Failed
Traceback (most recent call last):
File “/training/extraction/model_tag/train.py”, line 113, in _train
df_train, df_test = preprocess.preprocess_data(opt)
File “/training/extraction/model_tag/preprocess.py”, line 319, in preprocess_data
errors, report = download_data.generate_tag_report(opt[“dataset”][“path”])
File “/training/extraction/webapp_tagger/download_data.py”, line 365, in generate_tag_report
for line in doc[“words”]:
KeyError: ‘words’
2020-09-12 09:26:24,952 - wrapper.training_wrapper:run:143 - ERROR: Training failed for pipeline type: TRAIN_ONLY, error: Dataset Creation Failed
2020-09-12 09:26:24,953 - main:main:78 - ERROR: Training Job failed, error: Dataset Creation Failed
Traceback (most recent call last):
File “/training/extraction/model_tag/train.py”, line 113, in _train
df_train, df_test = preprocess.preprocess_data(opt)
File “/training/extraction/model_tag/preprocess.py”, line 319, in preprocess_data
errors, report = download_data.generate_tag_report(opt[“dataset”][“path”])
File “/training/extraction/webapp_tagger/download_data.py”, line 365, in generate_tag_report
for line in doc[“words”]:
KeyError: ‘words’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “trainer_run.py”, line 73, in main
wrapper.run()
File “/training/wrapper/training_wrapper.py”, line 144, in run
raise e
File “/training/wrapper/training_wrapper.py”, line 136, in run
self.run_train_only()
File “/training/wrapper/training_wrapper.py”, line 205, in run_train_only
self.train_model(self.local_dataset_directory)
File “/training/wrapper/training_wrapper.py”, line 96, in train_model
self.model.train(directory)
File “/training/train.py”, line 24, in train
train_local._train(self.opt, self.df_train, self.df_test)
File “/training/extraction/model_tag/train.py”, line 117, in _train
raise Exception(“Dataset Creation Failed”)
Exception: Dataset Creation Failed
2020-09-12 09:26:53,100 - main:main:69 - INFO: Starting training job…
2020-09-12 09:26:53,101 - wrapper.utils:_retries:20 - INFO: Total time taken to execute func: list_blobs : 9.679794311523438e-05
2020-09-12 09:26:53,104 - wrapper.utils:_retries:20 - INFO: Total time taken to execute func: list_blobs : 5.91278076171875e-05
2020-09-12 09:26:53,204 - wrapper.upload_log_service:upload_logs_file:52 - INFO: Retry Training Triggered:
2020-09-12 09:26:53,393 - wrapper.gcp_storage_client:download:86 - INFO: Dataset from bucket folder training-6be660f5-d47e-4657-93ba-089e51dd374d/0aed828e-48c5-467b-a1c8-9507efa4b146/27a1f771-d311-4218-9cb6-52ebf5efb554 with size 5 downloaded successfully
2020-09-12 09:26:53,394 - wrapper.utils:_retries:20 - INFO: Total time taken to execute func: download : 0.2896537780761719
2020-09-12 09:26:53,394 - wrapper.training_wrapper:train_model:94 - INFO: Start model training…
2020-09-12 09:26:53,394 - wrapper.training_wrapper:initialize_model:88 - INFO: Start model initialization…
2020-09-12 09:26:53,395 - wrapper.training_wrapper:initialize_model:91 - INFO: Model initialized successfully
2020-09-12 09:26:53,396 - root:preprocess_data:318 - INFO: Create Dataset
2020-09-12 09:26:53,397 - root:_train:115 - ERROR: Dataset Creation Failed
Traceback (most recent call last):
File “/training/extraction/model_tag/train.py”, line 113, in _train
df_train, df_test = preprocess.preprocess_data(opt)
File “/training/extraction/model_tag/preprocess.py”, line 319, in preprocess_data
errors, report = download_data.generate_tag_report(opt[“dataset”][“path”])
File “/training/extraction/webapp_tagger/download_data.py”, line 365, in generate_tag_report
for line in doc[“words”]:
KeyError: ‘words’
2020-09-12 09:26:53,398 - wrapper.training_wrapper:run:143 - ERROR: Training failed for pipeline type: TRAIN_ONLY, error: Dataset Creation Failed
2020-09-12 09:26:53,398 - main:main:78 - ERROR: Training Job failed, error: Dataset Creation Failed
Traceback (most recent call last):
File “/training/extraction/model_tag/train.py”, line 113, in _train
df_train, df_test = preprocess.preprocess_data(opt)
File “/training/extraction/model_tag/preprocess.py”, line 319, in preprocess_data
errors, report = download_data.generate_tag_report(opt[“dataset”][“path”])
File “/training/extraction/webapp_tagger/download_data.py”, line 365, in generate_tag_report
for line in doc[“words”]:
KeyError: ‘words’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “trainer_run.py”, line 73, in main
wrapper.run()
File “/training/wrapper/training_wrapper.py”, line 144, in run
raise e
File “/training/wrapper/training_wrapper.py”, line 136, in run
self.run_train_only()
File “/training/wrapper/training_wrapper.py”, line 205, in run_train_only
self.train_model(self.local_dataset_directory)
File “/training/wrapper/training_wrapper.py”, line 96, in train_model
self.model.train(directory)
File “/training/train.py”, line 24, in train
train_local._train(self.opt, self.df_train, self.df_test)
File “/training/extraction/model_tag/train.py”, line 117, in _train
raise Exception(“Dataset Creation Failed”)
Exception: Dataset Creation Failed
2020-09-12 09:27:08,767 - main:main:69 - INFO: Starting training job…
2020-09-12 09:27:08,767 - wrapper.utils:_retries:20 - INFO: Total time taken to execute func: list_blobs : 9.489059448242188e-05
2020-09-12 09:27:08,770 - wrapper.utils:_retries:20 - INFO: Total time taken to execute func: list_blobs : 4.482269287109375e-05
2020-09-12 09:27:08,872 - wrapper.upload_log_service:upload_logs_file:52 - INFO: Retry Training Triggered:
2020-09-12 09:27:09,000 - wrapper.gcp_storage_client:download:86 - INFO: Dataset from bucket folder training-6be660f5-d47e-4657-93ba-089e51dd374d/0aed828e-48c5-467b-a1c8-9507efa4b146/27a1f771-d311-4218-9cb6-52ebf5efb554 with size 5 downloaded successfully
2020-09-12 09:27:09,000 - wrapper.utils:_retries:20 - INFO: Total time taken to execute func: download : 0.23051238059997559
2020-09-12 09:27:09,001 - wrapper.training_wrapper:train_model:94 - INFO: Start model training…
2020-09-12 09:27:09,001 - wrapper.training_wrapper:initialize_model:88 - INFO: Start model initialization…
2020-09-12 09:27:09,002 - wrapper.training_wrapper:initialize_model:91 - INFO: Model initialized successfully
2020-09-12 09:27:09,003 - root:preprocess_data:318 - INFO: Create Dataset
2020-09-12 09:27:09,003 - root:_train:115 - ERROR: Dataset Creation Failed
Traceback (most recent call last):
File “/training/extraction/model_tag/train.py”, line 113, in _train
df_train, df_test = preprocess.preprocess_data(opt)
File “/training/extraction/model_tag/preprocess.py”, line 319, in preprocess_data
errors, report = download_data.generate_tag_report(opt[“dataset”][“path”])
File “/training/extraction/webapp_tagger/download_data.py”, line 365, in generate_tag_report
for line in doc[“words”]:
KeyError: ‘words’
2020-09-12 09:27:09,004 - wrapper.training_wrapper:run:143 - ERROR: Training failed for pipeline type: TRAIN_ONLY, error: Dataset Creation Failed
2020-09-12 09:27:09,005 - main:main:78 - ERROR: Training Job failed, error: Dataset Creation Failed
Traceback (most recent call last):
File “/training/extraction/model_tag/train.py”, line 113, in _train
df_train, df_test = preprocess.preprocess_data(opt)
File “/training/extraction/model_tag/preprocess.py”, line 319, in preprocess_data
errors, report = download_data.generate_tag_report(opt[“dataset”][“path”])
File “/training/extraction/webapp_tagger/download_data.py”, line 365, in generate_tag_report
for line in doc[“words”]:
KeyError: ‘words’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “trainer_run.py”, line 73, in main
wrapper.run()
File “/training/wrapper/training_wrapper.py”, line 144, in run
raise e
File “/training/wrapper/training_wrapper.py”, line 136, in run
self.run_train_only()
File “/training/wrapper/training_wrapper.py”, line 205, in run_train_only
self.train_model(self.local_dataset_directory)
File “/training/wrapper/training_wrapper.py”, line 96, in train_model
self.model.train(directory)
File “/training/train.py”, line 24, in train
train_local._train(self.opt, self.df_train, self.df_test)
File “/training/extraction/model_tag/train.py”, line 117, in _train
raise Exception(“Dataset Creation Failed”)
Exception: Dataset Creation Failed

schema.json was downloaded from ‘configuring data manager’ page. This is a bit urgent and a prompt response would be really appreciated.

Thanks and Regards,

Kolitha

HI @alexcabuz

Kolitha is a friend of mine, and he is getting the above error saying Create Dataset Failed.
Any thoughts why he is getting this error?
He says that he has configured the dataset file training and latest folders.

Also, he has another problem, the Pipelines he creates with Run Now state never gets executed, and always remains in the “Queued” status.

Please help :slight_smile:

the Pipelines he creates with Run Now state never gets executed, and always remains in the “Queued” status : Did you checked the ML Logs?

Facing the same issue here, Data set creation failed error is thrown when running a test pipeline. I was able to successfully run the pipeline before and deploy ml skills earlier today. Any solutions?
@Jeremy_Tederry

Seems like there is some issue with “words”.

You can check that the original file that you trained the model on is identical to the new training file, there cannot be any new columns.