Difference In Results When An ML Model Is Used In Pre-Labelling And In ML Extractor

Why are different results observed when an ML Skill is used for data extraction in ML Extractor activity and same ML Skill is being used for pre-labelling on same document ?

Few differences can be observed while using an ML Skill through ML extractor on document and if the same ML Skill is used for pre-labelling, results observed (fields identified and text extracted in Data Manager) are different.

If ML extraction results are not satisfactory and if the results are to be consistent, OCR needs to be force applied on documents in the DU process as OCR in data manager is applied to all documents but the Digitize activity only applies OCR if document contains images(Unless ForceApplyOCR is set to True). This can be ensured by doing either of below,

  1. Enable the ForceApplyOCR property in Digitize Document activity
ForceOCR.jpg
  1. Enable UseServerSide property on ML Extractor property
UseServerSideOCR.jpg


Read more on