Read Native PDF file return both docText and DOM

Hi all,

I have a native PDF file, so I want to use Read PDF Text activity instead of Digitize Document. But the output of Read PDF Text is only docText, not have DOM. But in next step - Classify Document Scope, DOM is required field.

So please tell me is there any way to read Native PDF but still return both docText and DOM.


Tu have the DOM you need to use the Digitize Document activity, that’ll give you two outputs, DOM and document text.

Official Documentation:

Although Digitize Document activity returns DOM, it takes longer and its text formatting is not as good as Read PDF Text activity.

yes, it’ll take some time to create DOM

If you are getting good result in read pdf text, then how can you get poor results in digitize Document?.
are you using OCR? if so, which one?

Sorry, you right. It works. Thank for your help.

1 Like

Glad it worked! :+1:

Omni OCR will also give best result in this case.

If your query is resolved, mark a post as solution to close the thread.