I’m experiencing issues when trying to retrain the generic Document Understanding out-of-the-box package in AI Fabric.
Here’s what I tried:
- I used the data labeling module to manually label all the data for each of my 10 training documents.
- Afterwards, I created a full pipeline run using the generic, retrainable, DU package (version 4.0).
- I selected the folder created by Data Manager/Data Labeling as input folder
- I created a separate folder containing PDF documents for evaluation, and selected this as evaluation dataset.
- After activating this pipeline, it ran for a couple hours. In the logs, I can see that it successfully reaches 150 epochs during training. However, an error seems to occur during the evaluation process:
ValueError: max_df corresponds to < documents than min_df
I added the full .log file in this post. What could be the cause of this error? How can I fix it?
1f9eb25d-e633-44a8-9597-a02aa3590dfa.txt (150.1 KB)