AI Fabric Model Train Pipeline Failed

Hi Guys,

Currently, we are training some invoice documents through Data Manager (UiPath Product) with some custom fields While trying to train the pipeline using the data set that is exported from the Data manager

We are facing the error β€œFailed” every time we trained those schema sets.

a) OutoftheBox Model: Invoices, Document Understanding, and InvoicesIndia. (We tried to Train the dataset with all these models)
b) Have a Data Manager License
c) Invoice Data: Provided by Client with Various Languages and Formats.

Error Message:

Docker Build failed with error com.spotify.docker.client.exceptions.DockerException: e[91m ERROR: Command errored out with exit status 1:
command: /usr/local/bin/python -u -c β€˜import sys, setuptools, tokenize; sys.argv[0] = β€˜"’"’/tmp/pip-install-w52yfvxw/gmpy2/setup.py’"’"’; file=’"’"’/tmp/pip-install-w52yfvxw/gmpy2/setup.py’"’"’;f=getattr(tokenize, β€˜"’"β€˜open’"’"’, open)(file);code=f.read().replace(’"’"’\r\n’"’"’, β€˜"’"’\n’"’"’);f.close();exec(compile(code, file, β€˜"’"β€˜exec’"’"’))’ bdist_wheel -d /tmp/pip-wheel-qcmeqg7i
cwd: /tmp/pip-install-w52yfvxw/gmpy2/
Complete output (14 lines):
running bdist_wheel
running build
running build_ext
building β€˜gmpy2’ extension
creating build
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/src
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -DWITHMPFR -DWITHMPC -I/usr/local/include/python3.6m -c src/gmpy2.c -o build/temp.linux-x86_64-3.6/src/gmpy2.o
In file included from src/gmpy2.c:426:
src/gmpy.h:252:12: fatal error: mpfr.h: No such file or directory

include β€œmpfr.h”

^~~~~~~~
compilation terminated.
error: command β€˜gcc’ failed with exit status 1

ERROR: Failed building wheel for gmpy2
e[0me[91m ERROR: Command errored out with exit status 1:
command: /usr/local/bin/python -u -c β€˜import sys, setuptools, tokenize; sys.argv[0] = β€˜"’"’/tmp/pip-install-w52yfvxw/pyodbc/setup.py’"’"’; file=’"’"’/tmp/pip-install-w52yfvxw/pyodbc/setup.py’"’"’;f=getattr(tokenize, β€˜"’"β€˜open’"’"’, open)(file);code=f.read().replace(’"’"’\r\n’"’"’, β€˜"’"’\n’"’"’);f.close();exec(compile(code, file, β€˜"’"β€˜exec’"’"’))’ bdist_wheel -d /tmp/pip-wheel-s19hdx6o
cwd: /tmp/pip-install-w52yfvxw/pyodbc/
Complete output (14 lines):
running bdist_wheel
running build
running build_ext
building β€˜pyodbc’ extension
creating build
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/src
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -DPYODBC_VERSION=4.0.27 -I/usr/local/include/python3.6m -c src/buffer.cpp -o build/temp.linux-x86_64-3.6/src/buffer.o -Wno-write-strings
In file included from src/buffer.cpp:12:
src/pyodbc.h:56:10: fatal error: sql.h: No such file or directory
#include <sql.h>
^~~~~~~
compilation terminated.
error: command β€˜gcc’ failed with exit status 1

e[0me[91m ERROR: Failed building wheel for pyodbc
e[0me[91m ERROR: Command errored out with exit status 1:
command: /usr/local/bin/python -u -c β€˜import sys, setuptools, tokenize; sys.argv[0] = β€˜"’"’/tmp/pip-install-w52yfvxw/gmpy2/setup.py’"’"’; file=’"’"’/tmp/pip-install-w52yfvxw/gmpy2/setup.py’"’"’;f=getattr(tokenize, β€˜"’"β€˜open’"’"’, open)(file);code=f.read().replace(’"’"’\r\n’"’"’, β€˜"’"’\n’"’"’);f.close();exec(compile(code, file, β€˜"’"β€˜exec’"’"’))’ install --record /tmp/pip-record-gzhu_did/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/aifabric/.local/include/python3.6m/gmpy2
cwd: /tmp/pip-install-w52yfvxw/gmpy2/
Complete output (14 lines):
running install
running build
running build_ext
building β€˜gmpy2’ extension
creating build
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/src
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -DWITHMPFR -DWITHMPC -I/usr/local/include/python3.6m -c src/gmpy2.c -o build/temp.linux-x86_64-3.6/src/gmpy2.o
In file included from src/gmpy2.c:426:
src/gmpy.h:252:12: fatal error: mpfr.h: No such file or directory

include β€œmpfr.h”

^~~~~~~~
compilation terminated.
error: command β€˜gcc’ failed with exit status 1

… (truncated).

@Lahiru.Fernando @Palaniyappan @loginerror

Hi @Kesavaraj_K

Could it be that you have some sub-folders? Could you try removing them?

Hello @Kesavaraj_K

Did the export from Data Manager work properly?

Also did you make sure you had values for all the fields you defined in DM?

1 Like

Hi @loginerror,

The thing is that for training the re-trainable model (Invoices), I don’t know what exact files needs to be uploaded to the dataset.

Exporting after the annotation process in Data Manager, the zip file contains

  1. schema.json
  2. split.csv
  3. latest Directory which contains dataset.json of annotated files.
  4. images Directory which contains invoice images.

Which files needs to be uploaded to the dataset for Training Pipeline for Invoices? and is there any format or structure needs to be followed for pipeline creation?

Thanks in Advance!

Hi @Lahiru.Fernando,

Exporting the dataset worked fine from my view. No error or warning popped up :sweat_smile:

To answer your second question, Yes most documents had all fields may be one or two fields been missed.

Regarding that, we added one more field (Currency Type) to the taxonomy in DM. which ML Capabilities does not have in Invoices!

Does this affect the model or how can we add more fields if I need to create a retrainable model?

Thanks in Advance!

1 Like

Hello @Kesavaraj_K

Sorry for my late reply… Got busy with some stuff!

I was using the Data Manger couple of days back to train invoices model, and it worked fine…
The dataset I used to train is the data I generated through the Machine Learning Extractor Trainer activity. I zipped the output of that activity and imported to Data Manager.

once in Data Manager, did the validations there, and exported the results to DataSets. This exported data has several fields and files.

This exported folder is what I gave for the Train pipeline.

I don’t think adding a field is a problem. However, if you have a highly customized model for invoices, I would recommend to use the DocumentUnderstanding ML model that you can find inside out-of-the-box models instead of the Invoice model.

Also, please refer to the below links, it might come in handy…

Let me know if any of this helps, or we can try to connect sometime to check it out :slight_smile:

2 Likes

Hi @Lahiru.Fernando

Sorry for the Late Reply!

Tried the method suggested, DocumentUnderstanding ML Model resulted in the same results.

This weekend we tried adding more documents to the Dataset (300 Invoice Documents). Still issued the same results.

We are checking with any DLL or code misfunction over the Data Manager Chart

For now, That may be causing the issue. I’ll update if any methods give a solution.

Thanks for the support!

1 Like

Hi, were you able to find a solution? Which version of the Invoices model did you use?

Hi @SherlinS

For my instance, I have uploaded the file path wrongly over the input dataset.

The correct way to upload and run the pipeline,

  1. Export the Dataset from Data Manager. Pls verify the export was successful.

  2. Unzip the folder locally.

  3. Navigate to Automation Cloud β†’ AIFabric β†’ Respective Project β†’ Datasets β†’ Upload Folder Button on the top right corner

  4. After Uploading, Create a Pipeline for the respective model (Eg. I’ve used InvoicesIndia V4.0).

  5. After choosing the model version, Now choose the Dataset that you have uploaded.

This worked for me!

Regards,
Kesavaraj

Hi @Kesavaraj_K ,
Thank you for the update.