Document Manager Max Page Count Reached

Is there a limit to the number of documents or pages that Document Manager can hold?

Issue Description: When attempting to import new documents to a data labeling session the error API_ERROR_IMPORTMAXPAGECOUNTREACHEDERROR is displayed.

Resolution:

Document Manager currently has a 25,000-page limit per data labeling session. The page count must be reduced before new documents can be added to the dataset.

Note: It is recommended to take a backup of the dataset prior to removing documents.

Possible ways to reduce the dataset size while retaining the training data:

Method #1: Permanently delete any soft-deleted documents from the dataset that are no longer needed.

  • Below see an example of how documents are soft-deleted and where they would be listed


  • Below, see an example of how to "hard-delete" permanently delete documents:



Method #2: Split out the evaluation data in a separate dataset from the training dataset.

Additional Notes:

  • Pages are not considered the same as documents. 1 document that is imported, may contain multiple pages.
  • A future improvement is planned to clearly display page numbers for the dataset so that there is visibility on how many pages are being consumed.
  • For now, to get a general idea of how many pages are currently in the data labeling session, in newer versions of Document Manager, open the Dataset Diagnostic Tab (as shown below).

On this page, a general idea of how many pages are currently in the dataset can be obtained. However, please note that this page count does not include soft-deleted subset (batch) documents or evaluation subset (batch) documents.

See the example below which shows that there are 17709 pages in the current dataset. However, there were also ~4000 pages in the evaluation subset and ~3000 in the soft-deleted subset.

  • If the choice is made to download the dataset and then create a new dataset to split up the data to reduce the number of documents/pages, note that manually manipulating the files in the folder after the download is not officially supported. This will often cause corruption in the dataset. To break a large dataset into smaller file sizes for uploading to a new dataset, please see Split Large Datasets For Importing