Training Document Understanding > Invoices on Ai Fabric: "ERROR: Training Job failed, error: Training and / or test set is empty, verify that training / test split is correctly set in split.csv"

So I’m still struggling using the Data Manager together with Document Understanding > Invoices on Ai Fabric.

When I try to run a pipeline using the exported data from Data Manager, it keeps on failing. This is a part of the log:

2020-12-15 08:21:12,369 - uipath_core.training_plugin:process_data_model:113 - INFO: Start process model data…
2020-12-15 08:21:12,369 - uipath_core.training_plugin:initialize_model:95 - INFO: Start model initialization…
2020-12-15 08:21:12,374 - uipath_core.training_plugin:initialize_model:98 - INFO: Model initialized successfully
2020-12-15 08:21:12,374 - root:preprocess_data:379 - INFO: Create Dataset
2020-12-15 08:21:12,377 - root:preprocess_data:381 - INFO: Dataset invoices Statistics:
2020-12-15 08:21:16,792 - root:preprocess_data:423 - INFO: train: (1, 11)
2020-12-15 08:21:16,793 - root:preprocess_data:424 - INFO: test: (0, 11)
2020-12-15 08:21:16,793 - uipath_core.training_plugin:process_data:413 - ERROR: Failed to process data, error: Training and / or test set is empty, verify that training / test split is correctly set in split.csv
2020-12-15 08:21:16,793 - uipath_core.training_plugin:model_run:137 - ERROR: Training failed for pipeline type: FULL_TRAINING, error: Training and / or test set is empty, verify that training / test split is correctly set in split.csv
2020-12-15 08:21:16,795 - uipath_core.trainer_run:main:81 - ERROR: Training Job failed, error: Training and / or test set is empty, verify that training / test split is correctly set in split.csv
Traceback (most recent call last):
File “/home/aifabric/.local/lib/python3.7/site-packages/uipath_core/trainer_run.py”, line 76, in main
wrapper.run()
File “/microservice/training_wrapper.py”, line 57, in run
return self.training_plugin.model_run()
File “/home/aifabric/.local/lib/python3.7/site-packages/uipath_core/training_plugin.py”, line 138, in model_run
raise e
File “/home/aifabric/.local/lib/python3.7/site-packages/uipath_core/training_plugin.py”, line 126, in model_run
self.run_full_training()
File “/home/aifabric/.local/lib/python3.7/site-packages/uipath_core/training_plugin.py”, line 156, in run_full_training
self.process_data()
File “/home/aifabric/.local/lib/python3.7/site-packages/uipath_core/training_plugin.py”, line 414, in process_data
raise e
File “/home/aifabric/.local/lib/python3.7/site-packages/uipath_core/training_plugin.py”, line 411, in process_data
self.process_data_model(self.local_dataset_directory)
File “/home/aifabric/.local/lib/python3.7/site-packages/uipath_core/training_plugin.py”, line 114, in process_data_model
self.model.process_data(directory)
File “/microservice/train.py”, line 47, in process_data
self.df_train, self.df_test = preprocess.preprocess_data(self.opt, split=True)
File “/microservice/extraction/model_tag/preprocess.py”, line 425, in preprocess_data
assert len(df_train) > 0 and len(df_test) > 0, “Training and / or test set is empty, verify that training / test split is correctly set in split.csv”
AssertionError: Training and / or test set is empty, verify that training / test split is correctly set in split.csv

Split.csv:
afbeelding

What I did:

  • Ai Fabric Create a ML package using Document Understanding > Invoices
  • Data Manager: Import the schema from here.
  • Import & label 1 document for training (just using 1 document to check if the process works)
  • Import & label 1 (other) document for testing
  • Export to zip (here Data manager automatically creates the split.csv and subfolders latest and images. So I don’t understand how it can be wrong. Split.csv is populated with the 2 documents. Everything seems in order as far as I can tell.)
  • Unzip folder
  • Ai Fabric Upload folder to datasets
  • Create new pipeline using this new dataset
  • Wait… and error

What am I doing wrong?

1 Like

Hi @Whynotrobot,

You are planning create new field or correcting the current field?

Actually ,you need to kept atleast 20 document only then your pipeline will be get successful

Thanks,
Amaresan.P