When separate values of a multi-value field are present back to back on the same line, they are being labeled as a single value

In Document Manager, when labeling multi-value fields, if the values are present side by side on the same line, they are being labeled as a single value instead of separate ones. How should one label in this scenario?

In Document Manager, when labeling multi-value fields, if the values are present side by side on the same line, they are being labeled as a single value instead of separate ones.

  • This is because Document Manager currently lacks the capability to differentiate whether word tokens present side by side belong to a single value or separate values. Consequently, multiple values are extracted as a single value. This issue may be more pronounced for fields of types other than text. For instance, if the field is of a numeric type, all values would be combined and formatted into a single number.

  • As a workaround, values that appear on the same line and side-by-side can be labeled as one field and used for training purposes. When extracting data using the ML model, the extracted data can be checked for any separators like commas and then split accordingly during post-processing.