I tried training a ML Package in AI Fabric without using the dataset created in the “Train Machine Learning Classifier”, so I manually uploaded the documents in .pdf and .jpg files into a dataset I created. But when I try to run Full Pipelines using these files, I get the following error:
File “/opt/conda/lib/python3.7/multiprocessing/pool.py”, line 121, in worker
result = (True, func(*args, **kwds))
File “/microservice/classification/text/preprocess.py”, line 30, in _get_words_from_text_file
parsed_text = file.read()
File “/opt/conda/lib/python3.7/codecs.py”, line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x89 in position 0: invalid start byte
How can I solve this?