We ran the training pipeline on the same model and version, which is Remittance advices 23.4.0. The first run consisted of 80 documents and took 2.5 hours, while the second run had 500 documents and took 35 hours. I’m interested in understanding the relationship between the number of documents and the training time. In my case, it appears to be exponential. I would like to know about your experience with this issue as well.
The number of data used in training a pipeline seems to be a factor that significantly increases the training time. Since the default epoch value is set to 100, some more factors such as the complexity of the document, its digitization & number of labels also make some impact. Using GPU will reduce this time but is gonna consume a lot of AI units depending upon your use case.
Hope this helps,
So generally, when documents in both sets have similar complexity, label numbers, text length etc. the training time should be more or less proportional to the number of documents?
The training time for a machine learning model can be influenced by various factors, including the number of documents, the complexity of the model, the hardware resources available, and the efficiency of the training algorithm.
In general, it is expected that training time increases as the number of documents increases. However, the relationship between the number of documents and training time can vary depending on the specific model, data, and infrastructure.
You are right. It’s a safe bet to establish the proportional relationship between the number of documents & the training time.
An all new set of data takes comparatively more time than an already trained but retraining data, but their proportion remains the same.
Now I know everything I wanted to know.
Thank you, guys, for your help.
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.