Troubleshooting TypeError: 'list indices must be integers or slices, not str' when Training NER Model with JSON Data

Hello everyone,

I’m currently encountering an issue while training a Named Entity Recognition (NER) model using JSON data in UiPath’s AI Center. Specifically, I’m getting a TypeError with the message ‘list indices must be integers or slices, not str’. I believe the problem lies in how I’m accessing elements of a list within my code, but I’m having trouble identifying the exact cause.

Here’s a brief overview of what I’m doing: I have JSON files containing text data for leave requests, along with labeled entities such as names, dates, and reasons. I’m attempting to train a NER model using this data, but I keep encountering the aforementioned error.

I’ve checked the format of my JSON files, and they seem to be correctly structured with “text” and “entities” fields. However, I suspect there may be an issue with how I’m parsing or processing the JSON data in my code.

If anyone has encountered a similar issue or has expertise in training NER models with JSON data in UiPath’s AI Center, I would greatly appreciate any insights or suggestions you could provide. Thank you in advance for your help!

This is the error log :-

2024-04-03 14:56:19,979 - uipath_core.trainer_run:main:83 - INFO:  Starting training job...
2024-04-03 14:56:20,108 - matplotlib:_get_config_or_cache_dir:484 - WARNING:  Matplotlib created a temporary config/cache directory at /tmp/matplotlib-qre6sgwr because the default path (/home/aicenter/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2024-04-03 14:56:20,427 - matplotlib.font_manager:_load_fontmanager:1443 - INFO:  generated new fontManager
2024-04-03 14:56:22,184 - uipath_core.storage.azure_storage_client:download:118 - INFO:  Dataset from bucket folder training-aee14565-bca2-495f-8d45-f3cee36aeef7/670a117d-bba2-4a8d-8b59-6f8f572bb359/85ad1b6b-caa6-4a18-b051-7f7248edfc06 with size 1 downloaded successfully
2024-04-03 14:56:22,184 - uipath_core.training_plugin:download_dataset:513 - WARNING:  Deleting already existing folder name: /data/dataset/test
2024-04-03 14:56:22,211 - uipath_core.storage.azure_storage_client:download:118 - INFO:  Dataset from bucket folder training-aee14565-bca2-495f-8d45-f3cee36aeef7/670a117d-bba2-4a8d-8b59-6f8f572bb359/49c8f670-5f56-4169-9a74-b971874fa645 with size 1 downloaded successfully
2024-04-03 14:56:22,211 - uipath_core.training_plugin:process_data_model:151 - INFO:  Start process model data...
2024-04-03 14:56:22,211 - uipath_core.training_plugin:initialize_model:124 - INFO:  Start model initialization...
2024-04-03 14:56:22,350 - uipath_core.training_plugin:initialize_model:127 - INFO:  Model initialized successfully
2024-04-03 14:56:22,350 - root:get_data_processor:40 - INFO:  Using Json Data Processor ...
2024-04-03 14:56:22,350 - root:process_files:24 - INFO:  Processing JSON files
2024-04-03 14:56:22,350 - root:process_files:28 - INFO:  Reading NER_Training.json ...
2024-04-03 14:56:22,351 - uipath_core.training_plugin:process_data:529 - ERROR:  Failed to process data, error: list indices must be integers or slices, not str
2024-04-03 14:56:22,351 - uipath_core.training_plugin:model_run:179 - ERROR:  Training failed for pipeline type: FULL_TRAINING, error: list indices must be integers or slices, not str
2024-04-03 14:56:22,386 - uipath_core.trainer_run:main:100 - ERROR:  Training Job failed, error: list indices must be integers or slices, not str
Traceback (most recent call last):
  File "/home/aicenter/.local/lib/python3.8/site-packages/uipath_core/trainer_run.py", line 95, in main
    wrapper.run()
  File "/microservice/training_wrapper.py", line 58, in run
    return self.training_plugin.model_run()
  File "/home/aicenter/.local/lib/python3.8/site-packages/uipath_core/training_plugin.py", line 195, in model_run
    raise ex
  File "/home/aicenter/.local/lib/python3.8/site-packages/uipath_core/training_plugin.py", line 167, in model_run
    self.run_full_training()
  File "/home/aicenter/.local/lib/python3.8/site-packages/uipath_core/training_plugin.py", line 212, in run_full_training
    self.process_data()
  File "/home/aicenter/.local/lib/python3.8/site-packages/uipath_core/training_plugin.py", line 530, in process_data
    raise e
  File "/home/aicenter/.local/lib/python3.8/site-packages/uipath_core/training_plugin.py", line 527, in process_data
    self.process_data_model(self.local_dataset_directory)
  File "/home/aicenter/.local/lib/python3.8/site-packages/uipath_core/training_plugin.py", line 152, in process_data_model
    response = self.model.process_data(directory)
  File "/microservice/train.py", line 67, in process_data
    self.df_train, self.df_test = preprocess.preprocess_data(self.opt, split=True)
  File "<frozen aicenter.ner.preprocess>", line 52, in preprocess_data
  File "<frozen aicenter.ner.data_processors.base_data_processor>", line 24, in process_data
  File "<frozen aicenter.ner.data_processors.json_data_processor>", line 29, in process_files
  File "<frozen aicenter.ner.data_processors.json_data_processor>", line 42, in process_file
  File "<frozen aicenter.ner.data_processors.base_data_processor>", line 30, in read_field
TypeError: list indices must be integers or slices, not str
2024-04-03 14:56:22,386 - uipath_core.trainer_run:main:107 - INFO:  Job run stopped.
1 Like

hi,

would it be possible for you to share your json you can pass dummy data in place of confidential data.

I assume your json is a list of dictionaries which is being accessed by a string of a key instead of an index. You might want to look at the code and see where you are doing that.

Note:- Based on my python experience and not AI center

@JyotBuch ,

Please read the instruction of NER out of Box ML Package. In description section it is clearly mention that “This preview model allows you to bring your own dataset tagged with entities you want to extract. The training and evaluation datasets need to be in CoNLL format.” Do the needful as per the description.

Thanks for more assistance you can contact me

1 Like

Thanks for the response, I will do the needful and update on this topic soon!

Hey, being an AI center newbie myself it’s tough to find out exactly where the code is unable to go forward. I’ll try to resolve this and update you on this topic

Thanks for the response.

Hi,

I converted my training and validation data to CoNLL format and tried training the model with new training set. While it trains successfully, it fails during the validation step.

The new error is as follows.


** RUN LOG **
2024-04-05 15:11:49,824 - uipath_core.trainer_run:main:83 - INFO:  Starting training job...
2024-04-05 15:11:50,017 - matplotlib:_get_config_or_cache_dir:484 - WARNING:  Matplotlib created a temporary config/cache directory at /tmp/matplotlib-89zv9k7k because the default path (/home/aicenter/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2024-04-05 15:11:50,337 - matplotlib.font_manager:_load_fontmanager:1443 - INFO:  generated new fontManager
2024-04-05 15:11:52,485 - uipath_core.storage.azure_storage_client:download:118 - INFO:  Dataset from bucket folder training-35d1b076-a7fd-4a9d-9aa2-169f146a2340/4771e077-3e92-498f-9db5-f5143fe177fa/dc7c1431-a35b-4846-a0d6-3d1452d3eff7 with size 1 downloaded successfully
2024-04-05 15:11:52,485 - uipath_core.training_plugin:download_dataset:513 - WARNING:  Deleting already existing folder name: /data/dataset/test
2024-04-05 15:11:52,501 - uipath_core.storage.azure_storage_client:download:118 - INFO:  Dataset from bucket folder training-35d1b076-a7fd-4a9d-9aa2-169f146a2340/4771e077-3e92-498f-9db5-f5143fe177fa/803dcbc6-fd65-4564-808a-740ef6c3d6b3 with size 1 downloaded successfully
2024-04-05 15:11:52,501 - uipath_core.training_plugin:process_data_model:151 - INFO:  Start process model data...
2024-04-05 15:11:52,501 - uipath_core.training_plugin:initialize_model:124 - INFO:  Start model initialization...
2024-04-05 15:11:52,502 - uipath_core.training_plugin:initialize_model:127 - INFO:  Model initialized successfully
2024-04-05 15:11:52,502 - root:get_data_processor:37 - INFO:  Using Label Studio Data Processor ...
2024-04-05 15:11:52,502 - root:process_files:25 - INFO:  Reading export_training_F.txt ...
2024-04-05 15:11:52,514 - root:process_files:25 - INFO:  Reading export_validation_F.txt ...
2024-04-05 15:11:53,193 - root:detect_language:99 - INFO:   language detected : en
2024-04-05 15:11:53,194 - root:preprocess_data:71 - INFO:  Train shape  : (45, 3)
2024-04-05 15:11:53,194 - root:preprocess_data:71 - INFO:  Test shape  : (10, 3)
2024-04-05 15:11:53,194 - root:print_data_statistics:77 - INFO:  Training data statistics
2024-04-05 15:11:53,194 - root:print_data_statistics:78 - INFO:  Number of documents used to train model: 45
2024-04-05 15:11:53,195 - root:print_data_statistics:82 - INFO:  Name : 45
2024-04-05 15:11:53,195 - root:print_data_statistics:82 - INFO:  nge : 35
2024-04-05 15:11:53,195 - root:print_data_statistics:82 - INFO:  Reason : 45
2024-04-05 15:11:53,195 - root:print_data_statistics:83 - INFO:  Test data statistics
2024-04-05 15:11:53,195 - root:print_data_statistics:84 - INFO:  Number of documents used to evaluate model : 10
2024-04-05 15:11:53,196 - root:print_data_statistics:84 - INFO:  Name : 10
2024-04-05 15:11:53,196 - root:print_data_statistics:84 - INFO:  nge : 10
2024-04-05 15:11:53,196 - root:print_data_statistics:84 - INFO:  Reason : 10
2024-04-05 15:11:53,196 - uipath_core.training_plugin:process_data_model:154 - INFO:  Model process data successfully with response None
2024-04-05 15:11:53,196 - uipath_core.training_plugin:evaluate_model:140 - INFO:  Start model evaluation...
2024-04-05 15:11:53,196 - root:load:212 - INFO:  Loading model...
2024-04-05 15:11:53,196 - uipath_core.training_plugin:trigger_full_training_and_publish_model:543 - ERROR:  Failed to evaluate untrained model with error: [Errno 2] No such file or directory: '/microservice/models/default/model.p'
2024-04-05 15:11:53,196 - uipath_core.training_plugin:train_model:130 - INFO:  Start model training...
2024-04-05 15:11:53,196 - root:load_tokenizer:178 - INFO:  Loading bert-base-uncased tokenizer from the work_dir path
2024-04-05 15:11:53,675 - root:__init__:68 - INFO:  Removing folder /microservice/models/default
2024-04-05 15:11:53,676 - root:__init__:20 - INFO:  Loading model from the model path
2024-04-05 15:11:56,773 - root:post_create_network:136 - INFO:  Enabling multi_gpu setting
2024-04-05 15:11:56,776 - root:create_optimizer:129 - INFO:  Creating AdamW optimizer
2024-04-05 15:11:56,777 - root:create_scheduler:190 - INFO:  Building scheduler...
2024-04-05 15:11:56,777 - root:train:392 - INFO:  Training for 5 epochs
2024-04-05 15:12:28,678 - root:save:206 - INFO:  Saving model...
2024-04-05 15:12:29,243 - root:checkpoint:381 - INFO:  Best score 0.5265
2024-04-05 15:12:29,243 - root:train:403 - INFO:  epoch 1 loss_train 0.8648 loss_test 0.3335 lr 0.0001 score_train 0.0250 score_test 0.5265
2024-04-05 15:12:56,731 - root:save:206 - INFO:  Saving model...
2024-04-05 15:12:57,575 - root:checkpoint:381 - INFO:  Best score 0.7074
2024-04-05 15:12:57,576 - root:train:403 - INFO:  epoch 2 loss_train 0.2790 loss_test 0.1320 lr 0.0001 score_train 0.4519 score_test 0.7074
2024-04-05 15:13:24,387 - root:save:206 - INFO:  Saving model...
2024-04-05 15:13:25,231 - root:checkpoint:381 - INFO:  Best score 0.9091
2024-04-05 15:13:25,232 - root:train:403 - INFO:  epoch 3 loss_train 0.1420 loss_test 0.0633 lr 0.0001 score_train 0.6314 score_test 0.9091
2024-04-05 15:13:52,318 - root:save:206 - INFO:  Saving model...
2024-04-05 15:13:53,229 - root:checkpoint:381 - INFO:  Best score 0.9182
2024-04-05 15:13:53,230 - root:train:403 - INFO:  epoch 4 loss_train 0.0821 loss_test 0.0353 lr 0.0001 score_train 0.8373 score_test 0.9182
2024-04-05 15:14:19,877 - root:train:403 - INFO:  epoch 5 loss_train 0.0496 loss_test 0.0249 lr 0.0001 score_train 0.9046 score_test 0.9091
2024-04-05 15:14:19,877 - root:train:403 - INFO:  Training done. Best Test accuracy 0.9182 epoch 4
2024-04-05 15:14:19,900 - uipath_core.training_plugin:train_model:134 - INFO:  Model trained successfully with response None
2024-04-05 15:14:19,901 - uipath_core.training_plugin:save_model:157 - INFO:  Start model save...
2024-04-05 15:14:19,901 - uipath_core.training_plugin:save_model:160 - INFO:  Model save successful with response None
2024-04-05 15:14:19,901 - uipath_core.training_plugin:evaluate_model:140 - INFO:  Start model evaluation...
2024-04-05 15:14:19,901 - root:load:212 - INFO:  Loading model...
2024-04-05 15:14:19,918 - root:__init__:20 - INFO:  Loading model from the model path
2024-04-05 15:14:24,612 - uipath_core.training_plugin:trigger_full_training_and_publish_model:578 - ERROR:  The model was trained successfully but failed on the evaluation step

The following is a snippet of the data in CoNLL format.

-DOCSTART- -X- O
Dear -X- _ O
Sarah -X- _ B-Name
Johnson -X- _ I-Name
, -X- _ O
I -X- _ O
am -X- _ O
writing -X- _ O
to -X- _ O
inform -X- _ O
you -X- _ O
about -X- _ O
my -X- _ O
intention -X- _ O
to -X- _ O
take -X- _ O
a -X- _ O
personal -X- _ O
leave -X- _ O
of -X- _ O
absence -X- _ O
from -X- _ O
August -X- _ B-Date Range
10th -X- _ I-Date Range
to -X- _ I-Date Range
August -X- _ I-Date Range
25th. -X- _ I-Date Range
This -X- _ O
leave -X- _ O
is -X- _ O
necessary -X- _ O
for -X- _ O
personal -X- _ B-Reason
reasons. -X- _ I-Reason
I -X- _ O
have -X- _ O
made -X- _ O
arrangements -X- _ O
to -X- _ O
ensure -X- _ O
that -X- _ O
my -X- _ O
responsibilities -X- _ O
are -X- _ O
covered -X- _ O
during -X- _ O
my -X- _ O
absence. -X- _ O
Please -X- _ O
let -X- _ O
me -X- _ O
know -X- _ O
if -X- _ O
there -X- _ O
are -X- _ O
any -X- _ O
further -X- _ O
steps -X- _ O
I -X- _ O
need -X- _ O
to -X- _ O
take -X- _ O
to -X- _ O
formalize -X- _ O
this -X- _ O
leave -X- _ O
request. -X- _ O
Thank -X- _ O
you -X- _ O
for -X- _ O
your -X- _ O
understanding. -X- _ O
Regards -X- _ O
, -X- _ O
Michael -X- _ O

Do let me know if any other information is needed, thanks for the help in advance!

@JyotBuch ,

Try the following option and check…

  1. Please just create training pipeline do not create the full pipeline
  2. Do not enable GPU

Thanks…

Hey, thanks for the suggestion.

I had used the training only pipeline, and it was able to run successfully without the GPU.

However, the validation run failed returning the following error

2024-04-08 11:12:08,394 - uipath_core.trainer_run:main:100 - ERROR: Training Job failed, error: not enough values to unpack (expected 2, got 1) Traceback (most recent call last): File "/home/aicenter/.local/lib/python3.8/site-packages/uipath_core/trainer_run.py", line 95, in main wrapper.run() File "/microservice/training_wrapper.py", line 58, in run return self.training_plugin.model_run() File "/home/aicenter/.local/lib/python3.8/site-packages/uipath_core/training_plugin.py", line 195, in model_run raise ex File "/home/aicenter/.local/lib/python3.8/site-packages/uipath_core/training_plugin.py", line 169, in model_run self.run_evaluation_only() File "/home/aicenter/.local/lib/python3.8/site-packages/uipath_core/training_plugin.py", line 228, in run_evaluation_only score = self.evaluate_model(self.local_dataset_directory, "evaluation") File "/home/aicenter/.local/lib/python3.8/site-packages/uipath_core/training_plugin.py", line 141, in evaluate_model response = self.model.evaluate(directory) File "/microservice/train.py", line 57, in evaluate metrics = mymodel.evaluate(self.df_test) File "<frozen aicenter.ner.model>", line 184, in evaluate File "/home/aicenter/.local/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "<frozen aicenter.ner.model>", line 99, in predict File "/home/aicenter/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 8839, in apply return op.apply().__finalize__(self, method="apply") File "/home/aicenter/.local/lib/python3.8/site-packages/pandas/core/apply.py", line 727, in apply return self.apply_standard() File "/home/aicenter/.local/lib/python3.8/site-packages/pandas/core/apply.py", line 851, in apply_standard results, res_index = self.apply_series_generator() File "/home/aicenter/.local/lib/python3.8/site-packages/pandas/core/apply.py", line 867, in apply_series_generator results[i] = self.f(v) File "<frozen aicenter.ner.model>", line 99, in <lambda> File "<frozen aicenter.ner.model>", line 129, in predict_row File "<frozen aicenter.ner.model>", line 146, in predict_spans ValueError: not enough values to unpack (expected 2, got 1) 2024-04-08 11:12:08,397 - uipath_core.trainer_run:main:107 - INFO: Job run stopped.

The output that was returned in UiPath studio is as follows

@"
{
  ""code"": ""InternalServerError"",
  ""message"": ""Prediction Failed"",
  ""stacktrace"": null,
  ""trace_id"": ""c398ae46-3292-4098-9285-2d492e4664e3"",
  ""reason"": ""
  {
    \""message\"": \""not enough values to unpack (expected 2,
    got 1)\"",
    \""stacktrace\"": \""  File \\\""/microservice/main.py\\\"",
    line 32,
    in predict\\n    df = self.mymodel.predict(df)\\n  File \\\""/home/aicenter/.local/lib/python3.8/site-packages/torch/autograd/grad_mode.py\\\"",
    line 27,
    in decorate_context\\n    return func(*args,
    **kwargs)\\n  File \\\""<frozen aicenter.ner.model>\\\"",
    line 99,
    in predict\\n  File \\\""/home/aicenter/.local/lib/python3.8/site-packages/pandas/core/frame.py\\\"",
    line 8839,
    in apply\\n    return op.apply().__finalize__(self,
    method=\\\""apply\\\"")\\n  File \\\""/home/aicenter/.local/lib/python3.8/site-packages/pandas/core/apply.py\\\"",
    line 727,
    in apply\\n    return self.apply_standard()\\n  File \\\""/home/aicenter/.local/lib/python3.8/site-packages/pandas/core/apply.py\\\"",
    line 851,
    in apply_standard\\n    results,
    res_index = self.apply_series_generator()\\n  File \\\""/home/aicenter/.local/lib/python3.8/site-packages/pandas/core/apply.py\\\"",
    line 867,
    in apply_series_generator\\n    results[i] = self.f(v)\\n  File \\\""<frozen aicenter.ner.model>\\\"",
    line 99,
    in <lambda>\\n  File \\\""<frozen aicenter.ner.model>\\\"",
    line 129,
    in predict_row\\n  File \\\""<frozen aicenter.ner.model>\\\"",
    line 146,
    in predict_spans\\nValueError: not enough values to unpack (expected 2,
    got 1)\""
  }""
 
}"

I have also checked the format of the validation file, it is same as the training file.

Thanks in advance.

@JyotBuch ,

I think you do not aware how to use NER model once pipeline is created successfully so please go through with this video. It might clear your query

Thanks…