24.11 Preview - Document Understanding updates

Hello community, long time no forum post from us - not because we didn’t work on exciting features - but precisely for this reason!

I’m happy to present to you some of the updates that come with the 24.11 Preview Release in Document Understanding.

Generative extraction, optimized for complex documents - long and short, with various layouts and elements - we got it all!

The fact that we provide generative extraction capabilities is no news - we have been extracting information from short and long documents, with rather simple layouts, so almost one year now - however, some documents with various complex elements like certain floating callout boxes, have not seen an accurate extraction. Furthermore, we also wanted to leverage Retrieval-augmented generation to improve the obtained results - in this sense, I’m happy to share we are releasing a new predefined project, available out-of-the-box for everyone: the Generative Predefined Project, coming with 3 pretrained generative extraction models, optimized for a particular use case:

  1. Long Document - Simple Layout → default selection as available today (renamed from Generative Extractor), optimized for unstructured, long documents (longer than 20 pages), uses RAG (Retrieval Augmented Generation) with GPT4 Turbo, where only the text is sent to the model.
  2. Long Document - Complex Layout → new option, optimized for unstructured, long documents (longer than 20 pages), but with a better understanding of layouts, uses RAG with GPT4o, where both the text and images of the pages are sent to the model, for an improved performance in complex elements.
  3. Short Document - Complex Layout → new option, optimized for semi-structured short documents (shorter than 20 pages), does not use RAG, leverages GPT4o, where both the text and images of the pages are sent to the model, for an improved performance in complex elements.

The above 3 models would allow the users to accurately process complex documents with minimal manual intervention, speeding up the process, reducing errors, and improving accuracy, by utilizing a model that is tailored to their use case.

The above are available in APIs, v1.1 - a newly created API version as well as Activities: Extract Document Data from the DocumentUnderstanding.Activities package or the Document Understanding Project Extractor from the IntelligentOCR package.
They are part of a newly published Predefined Project, called Generative Predefined, which is a modern project adhering to the modern pricing, discoverable and usable the same way as the previously released Generative Extractor.

Support for Project Versions and Tags in Activities and APIs
When it comes to consuming custom Modern Projects, we are providing the ability to publish project versions and assigning deployment tags to them - so that users do not need to redeploy an automation after publishing a new project version that they want to consume.
In this version, we have enhanced both APIs & Activities with project versions or tags - so that, when creating automations, one can either:

  1. select a precise, fixed project version referencing in automations, that will not be changed - always referencing the same model snapshots
  2. or select a tag, which is an umbrella for a project version - that may change in the Modern Project without the requirement of updating automations - as these will automatically reference the project version where the tag was set on.

The above would help automation developers consume a snapshot of a Document Understanding project in various environments, so one can clearly separate development from testing and production processes.

9 Likes

Great stuff as always, can’t wait to try the versioning/tagging :slight_smile:

2 Likes

Can this be used to extract tabular data?

Thanks,
TIm

Hello @flymemory ,

We don’t support explicit table extraction with this new feature - we are however working on a solution for tabular use cases, to be previewed soon - keep an eye out on our upcoming releases :slight_smile:
Monica

Yes and no @flymemory - yes as it’s based on LLMs which technically support this, no as we don’t officially provide the support for tabular extraction, which we work on addressing with another feature, to be previewed soon :slight_smile: