🌟 [Public Preview] Delete Document Data via API

We’re excited to announce a much-awaited feature now available in Public Preview: the ability to delete runtime and validation data associated with your documents via API! :tada:

This is a significant step forward for all Document Understanding customers—especially those handling sensitive data in regulated industries like healthcare, banking, and insurance—who want more control over their document lifecycle and data privacy.

:light_bulb: What’s New?

A new DELETE API endpoint is available for removing all runtime data tied to a given documentId. This includes:

  • Digitization results (DOM, Text, Optimized PDFs)
  • Classification and extraction results
  • Validation and classification tasks (based on user input)

This means once you’ve extracted what you need from a document, you can clean up all associated data—without waiting 7 days for automatic expiration.

How It Works

To ensure security and control, only External Applications with the new scope below will be able to access this endpoint: Du.Deletion.Api - so make sure to add it to your scope before authenticating.

You can now call:

POST /projects/{projectId}/document/{documentId}/

This initiates a deletion of all data linked to the document. You’ll receive a 202 Accepted response to confirm the request was received.

To confirm the deletion, you can call:

GET /digitization/{documentId}

Once the deletion is complete, this endpoint will return a 404.

The request also supports an optional input parameter:

  • forceDeleteValidationData: true or false (default)

This controls whether associated open validation/classification tasks should be deleted too.

  • If set to true, tasks will be deleted and inaccessible.
  • If set to false and open tasks exist, the API will return:
403 Forbidden
Cannot delete document with ID {documentId}, as it has open validation tasks – either complete the created tasks or set the forceDeleteValidationData option to “true”.

This gives you full control over how and when task-related data is removed.

Why This Matters

We understand that in many use cases, especially when dealing with PII or PHI, retaining document data beyond its processing lifecycle is not acceptable.

This new capability helps you stay compliant with internal and external data governance policies by:

  • Giving you full control over when and how document data is deleted
  • Reducing data retention risks
  • Enabling on-demand cleanup after workflows are complete

We’d love to hear your feedback during the Public Preview phase! Let us know how this new capability fits into your document processing lifecycle and what we can improve. :raising_hands:

Happy automating! :robot:

— The Document Understanding Team

11 Likes

Much awaited feature. Was waiting for this from long time. Thanks :slightly_smiling_face:

1 Like