How do you handle both scanned and native PDFs in one DU process?

I’m working on a document processing flow using Document Understanding. The challenge is that some PDFs are digital (native) and others are scanned images. Using a single extractor doesn’t give good results across both types. Has anyone built a DU pipeline that can handle both in one process without branching into two separate workflows?

Hey @Masuma_Khatun,
You can handle both scanned and native PDFs in one DU process without splitting workflows.

Here’s how:

  • Use Digitize Document with OmniPage OCR — it works for both scanned and digital PDFs.
  • In Data Extraction Scope, add both Form Extractor (for digital) and ML or Regex Extractor (for scanned).
  • Map extractors to specific fields using the Manage Extractors panel.
  • UiPath will pick the right extractor based on what’s available in the document.

No need to branch. Just configure the extractors smartly.

1 Like

@Mir.Jasimuddin,

Thank you for the solution.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.