ML Extractor mixing/merging rows when table spacing is too tight

I’m currently training an ML Extractor for invoice processing and I’ve run into a problem that I can’t seem to fix.

I have two types of invoices:


:check_mark: Type A – Works perfectly

These invoices have bigger gaps between each item row, and the ML Extractor handles them without any issues. Every row is detected cleanly.


:cross_mark: Type B – Problematic


This invoice has very tight spacing between item rows, and this is where the problem starts.
The ML Extractor keeps merging multiple lines into a single row, or mixes the fields between rows. Basically, the model can’t “see” where one row ends and the next begins.


What I’ve tried so far

  • Added more training samples (including this problematic type)
  • Carefully re-labeled the table rows in Data Manager
  • Tried different bounding box shapes
  • Double-checked taxonomy

But the issue still only happens when the spacing between rows is very small.


What I’m hoping to get advice on

  • Is this a known limitation of the ML Extractor or the OCR (I’m using UiPath Document OCR)?
  • Any tricks for helping the model separate rows when they’re very close together?

If anyone has run into this before, I’d really appreciate your tips.
Thanks in advance! :folded_hands:

Hello @sks040826

Have you tried dragging the mouse over the line when Annotating the invoices in DU?
How many training examples have you provided that has this problem?

Regards
Soren

Hi,


This is how I label in DU. There are total of 22 files labelled and all of them are the same kind of invoices