Classify Handwritten Document vs Digital (Same Form)

Hello all,
I wanted to pick everyone’s brain on a recent thought. Is it possible to classify a handwritten version of a document vs a digital one where the document is the same and the only difference is one is computer generated and the other a human filled out the fields with a pen and uploaded.

I was hoping to use the “Signature Detection” functionality in Form Extractor thinking that it would identify a signature and thus if any of the signature fields returned “yes” then it’s safe to assume the document was handwritten. However, I tested this on a digital sample and it is recognizing the digital text as a signature. So I have a feeling the signature detection is simply “is there something in this field?” rather than differentiating a signature from digital text.

Any feedback is greatly appreciated!

So you want to separate computer generated documents and handwritten documents. One way to try is using a machine learning classifier, training with two datasets comprising of computer generated and handwritten data. Train the machine learning classifier model with both datasets.

Hi @ChristianVee ,

We would like to understand the document types that you are dealing with, When you mentioned Computer Generated Documents, Is it a Digital PDF, Or a Document from where we can manually copy content from it ?

The Other Document type mentioned as Human filled out fields, Is it totally a Scanned Document, an Image ?

The reason for these questions is that if it is digitally available to copy content like a Digital PDF, we should be able to Classify by just using the Read PDF Text Activity.

If the Output returns values, then it should be a Digital PDF, if it doesn’t then it must be a Scanned PDF.

Maybe this is not the scenario, but we would like to understand in detail if this is not the case at hand.

1 Like