ML Skill not working - NLTK error

Hi,

I’m implementing a custom package with AI Fabric. In the preprocessing stage I’m using the nltk library to tokenize sentences and delete stop words. I download the required NLTK packages within my python code.

image

image

I can both train and evaluate pipelines on AI Fabric, but when I create an ML Skill to use within UiPath I get the following error when testing it with an input string:

{
“code”: “InternalServerError”,
“message”: “\n**********************************************************************\n Resource \u001b[93mpunkt\u001b[0m not found.\n Please use the NLTK Downloader to obtain the resource:\n\n \u001b[31m>>> import nltk\n >>> nltk.download(‘punkt’)\n \u001b[0m\n For more information see: https://www.nltk.org/data.html\n\n Attempted to load \u001b[93mtokenizers/punkt/PY3/english.pickle\u001b[0m\n\n Searched in:\n - ‘/home/aifabric/nltk_data’\n - ‘/usr/local/nltk_data’\n - ‘/usr/local/share/nltk_data’\n - ‘/usr/local/lib/nltk_data’\n - ‘/usr/share/nltk_data’\n - ‘/usr/local/share/nltk_data’\n - ‘/usr/lib/nltk_data’\n - ‘/usr/local/lib/nltk_data’\n - ‘’\n**********************************************************************\n”,
“stacktrace”: " File “/home/aifabric/.local/lib/python3.8/site-packages/uipath_core/auth/auth_processing_filter.py”, line 76, in _decorator\n return fn(args, kwargs)\n File “/home/aifabric/.local/lib/python3.8/site-packages/uipath_core/logs/logger_mvc.py”, line 42, in _decorator\n return fn(args, kwargs)\n File “/home/aifabric/.local/lib/python3.8/site-packages/uipath_core/auth/auth_processing_filter.py”, line 22, in _decorator\n return fn(args, kwargs)\n File “/home/aifabric/.local/lib/python3.8/site-packages/uipath_core/plugin.py”, line 112, in _model_predict\n output = self.model.predict(processed_data)\n File “/microservice/main.py”, line 20, in predict\n X_vector, _ = _create_vector(None, ‘test’, self.vectorizer, skill_input)\n File “/microservice/utils/clean_data.py”, line 208, in _create_vector\n X = _initial_clean(X_data)\n File “/microservice/utils/clean_data.py”, line 75, in _initial_clean\n new_emails.append(_clean_text(emails[i]))\n File “/microservice/utils/clean_data.py”, line 57, in _clean_text\n words = word_tokenize(new_text)\n File “/home/aifabric/.local/lib/python3.8/site-packages/nltk/tokenize/init.py”, line 129, in word_tokenize\n sentences = [text] if preserve_line else sent_tokenize(text, language)\n File “/home/aifabric/.local/lib/python3.8/site-packages/nltk/tokenize/init.py”, line 106, in sent_tokenize\n tokenizer = load(“tokenizers/punkt/{0}.pickle”.format(language))\n File “/home/aifabric/.local/lib/python3.8/site-packages/nltk/data.py”, line 752, in load\n opened_resource = _open(resource_url)\n File “/home/aifabric/.local/lib/python3.8/site-packages/nltk/data.py”, line 877, in open\n return find(path, path + [""]).open()\n File “/home/aifabric/.local/lib/python3.8/site-packages/nltk/data.py”, line 585, in find\n raise LookupError(resource_not_found)\nLookupError: \n*************************************************************\n Resource \u001b[93mpunkt\u001b[0m not found.\n Please use the NLTK Downloader to obtain the resource:\n\n \u001b[31m>>> import nltk\n >>> nltk.download(‘punkt’)\n \u001b[0m\n For more information see: https://www.nltk.org/data.html\n\n Attempted to load \u001b[93mtokenizers/punkt/PY3/english.pickle\u001b[0m\n\n Searched in:\n - ‘/home/aifabric/nltk_data’\n - ‘/usr/local/nltk_data’\n - ‘/usr/local/share/nltk_data’\n - ‘/usr/local/lib/nltk_data’\n - ‘/usr/share/nltk_data’\n - ‘/usr/local/share/nltk_data’\n - ‘/usr/lib/nltk_data’\n - ‘/usr/local/lib/nltk_data’\n - ‘’\n**********************************************************************\n",
“trace_id”: null
}

How can I resolve this?

Hi @ben_mi

This is happening because we are not allowing outbound call from AI Fabric and this is what nltk is trying to do (downloading data from outside). In order to solve that you need to incorporate nltk data in ML Package that you are uploading.

So inside your ML Package create a folder for example nltk_data , download punkt and stopwords package locally into this folder using command:

import nltk

nltk.download('punkt', download_dir="<MLPackagedirectory>/nltk_data")
nltk.download('stopwords', download_dir="<MLPackagedirectory>/nltk_data")

from any python script.

Then in main.py file (for example as first line of init function), includes this line to add the new directory to nltk path:

nltk.data.path.append(os.path.join(os.path.dirname(os.path.realpath(__file__)), 'nltk_data'))

Now you won’t need to download data anymore they wil lbe there locally so you won’t have this issue.

Jeremy

1 Like

Thanks so much! That fixed it for me! :smiley:

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.