Why AI-Fabric pipeline get killed automatically after being in running status for 7 days?
As per current design , pipelines will get killed automatically after 7 days even if those are in running state. This feature is implemented to reduce the waiting period if pipeline is stuck due to some issue in the backend.
In case ,there are large number of documents in the dataset for training and pipeline is running on CPU then it is possible that training the document understanding model may take more than 7 days because by default ,pipelines run for 150 epochs and due to more documents training time per epoch gets increased.This may lead to the scenario where pipeline can get killed before its completion.
To handle this scenario,number of EPOCHS for the model can be reduced by creating environment variable [on AIFabric application] ml_model.epochs