Something I use for document data extraction with PDF is the PreserveFormatting from the ReadPDFText activity. it is great for extracting some fields that have a special format (delimited spacing between them or other…)
Unfortunately, when I wish to use the validation station, I need to use the activity Digitize document with an OCR engine. (it doesn’t use OCR if the file is PDF Native)
BUT, there is no option for “PreserveFormatting” and then it messes all numbers up for some document structures (order is not correct anymore and so on)
So, I think adding the option “PreserveFormatting” for native PDF is a great feature to have.
For the moment, I did not find any workaround, because I can’t feed the extracted text to the Create document validation that doesn’t come from digitize document (otherwise, it says DOM doesn’t match the extracted text).