Public Preview: New Fine-Tunable Extraction Model in Modern Projects
Hi everyone!
We’re excited to announce the Public Preview of our new fine-tunable extraction model, UiPath Helix Extractor 2.0, now available in Europe and the United States for Document Understanding Modern Projects. This release brings significant improvements in accuracy, confidence, robustness, and inference stability, along with a new metrics methodology that more accurately reflects real user effort.
This post provides everything you need to get started: what’s new, how to enable it, expected behavior, and known limitations.
Availability & Timelines
The new extraction model is available today in Europe and the US as part of Public Preview.
Please note:
- The new model is not available on Automation Suite
- We will share updates once the Automation Suite rollout timelines are finalized.
Naming Update
We are beginning a transition to a new model naming structure across Document Understanding. These names will roll out in product UI first, followed by documentation and other materials during the preview period. In the product, the new model will appear as UiPath Helix Extractor 2.0. The existing DocPath model will appear as UiPath Extractor Helix 1. The older LayoutLM extraction model remains named as Legacy model.
What’s New
The new extraction model is built on Qwen-VL, a next-generation vision–language architecture that jointly analyzes text and layout. This is a major shift from the text-only T5 model and enables:
- Better performance on complex, semi-structured documents
- Robust table extraction through spatial reasoning
- Improved generalization to unfamiliar layouts and non-OOB formats
- Native multilingual support*, including non-Latin languages (e.g., Chinese, Hebrew, Japanese)
New metrics methodology
We introduced a new scoring approach that evaluates tables at row and column level, not as a single field. This creates metrics that more accurately reflect true correction effort and more effectively highlight extraction errors that previously went unnoticed.
Performance improvements
Accuracy
Across all tasks:
- 75.2% effective success rate (equal or better than previous model)
- great improvements on high-volume and complex document types
- Minor regressions remain on low-sample categories (<5 training docs); these are being addressed for GA
Confidence Levels
The new model increases average extraction confidence from 0.984 → 0.992.
Higher confidence correlates more strongly with true accuracy and results in:
- Fewer documents routed to Action Center
- Higher straight-through automation rates
Training Time
The new model:
- Trains much faster than Helix Extractor 1.0
- Is only slightly slower than the Legacy model
- Shows far more stable convergence behavior
Inference Latency
The new model:
- Has slightly higher median latency
- BUT significantly lower tail latency
This means latency is more predictable, avoiding the large spikes seen in older models.
How to Enable the New Model (Public Preview)
You can enable the model in two ways:
1. New Modern Project
During project creation, select Enable new extraction model.
2. Existing Modern Project
- Open Project Settings
- Enable New Helix Extractor 2.0
- Save
This triggers:
- New training for all document types
- Default use of the new model for future document types
Per-Document-Type Overrides
You can switch individual document types back to Legacy or Helix Extractor 1 if needed:
- Go to Document Type Manager
- Open Settings
- Choose the extractor version
- Save to trigger retraining
Pinned versions remain unchanged.
Special Case
Some document types could not be trained using Helix Extractor 1, and they still rely on the Legacy model:
- Financial Statements
- Invoices China
- Invoices Hebrew
- Invoices Japan
These document types are now supported by the new model, and enabling it will still trigger training for them.
Questions or Feedback?
This is a Public Preview, and your feedback is incredibly valuable.
Please post questions, issues, or insights directly in this thread so the DU engineering and product teams can follow up.
Looking forward to hearing what you build with the new model! ![]()

