How do you handle both scanned and native PDFs in one DU process?

Masuma_Khatun · August 3, 2025, 9:12am

I’m working on a document processing flow using Document Understanding. The challenge is that some PDFs are digital (native) and others are scanned images. Using a single extractor doesn’t give good results across both types. Has anyone built a DU pipeline that can handle both in one process without branching into two separate workflows?

Mir.Jasimuddin · August 3, 2025, 9:14am

Hey @Masuma_Khatun,
You can handle both scanned and native PDFs in one DU process without splitting workflows.

Here’s how:

Use Digitize Document with OmniPage OCR — it works for both scanned and digital PDFs.
In Data Extraction Scope, add both Form Extractor (for digital) and ML or Regex Extractor (for scanned).
Map extractors to specific fields using the Manage Extractors panel.
UiPath will pick the right extractor based on what’s available in the document.

No need to branch. Just configure the extractors smartly.

Masuma_Khatun · August 3, 2025, 9:16am

@Mir.Jasimuddin,

Thank you for the solution.

system · August 6, 2025, 9:17am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
UiPath extraction pdf vs du Activities excel	4	302	May 17, 2024
Uipath DU file Activities excel , studio	6	227	January 15, 2024
For the new document understanding feature why would I use OCR for Native PDFs Document Understanding	1	1017	June 18, 2020
PDF extraction from multiple pdf and how to check which pdf is scanned and which pdf is regular Activities pdf , activities	10	1735	March 10, 2022
Extracting information from large PDF file with Document Understanding Document Understanding	6	960	August 14, 2023

How do you handle both scanned and native PDFs in one DU process?

Related topics