Note that this probably only works for an automation that uses Classification Station (or classification in Action Center). If you’re also use Validation Station (or a validation action) this probably won’t work, because data will be missing from the DOM.
We have an attended automation that takes a PDF, digitizes it, and presents Classification Station so the user can split the file up. Even for small PDFs the digitize step was taking a long time, and for some it was taking over an hour or just timing out. So I came up with a way to manually create the DOM object.
You will need the three text files that are attached here. Save them in the main project folder (or you’ll have to edit the paths in the Read Text File activities).
DOM Header.txt (88 Bytes)
DOM Page.txt (279 Bytes)
DOM Footer.txt (139 Bytes)
The basic steps are to read these three files, and then concatenate it all into one string - repeating the Dom Page step for however many pages are in the PDF. Then we deserialize it into a Document object.
First step is to get PageCount using Get PDF Page Count.
Read the files (and replace the filename into the header) with the expression DOMHeaderText.Replace("||Filename||",SourceFile.Name)
Add the DOM Page text repeatedly, based on the number of pages in the PDF, using the expression SourceFileDOMText + DOMPageText.Replace("||PageNumber||",(CurrentItem-1).ToString)
Add the DOM Footer text:

Deserialize into a Document object using the expression JsonConvert.DeserializeObject(of Document)(SourceFileDOMText), and set the text to a dummy value (it can’t be empty string).

The DOM object and document text are used in the Present Classification Station activity:


