Ensuring single ticket recognition in document understanding

I am attempting to train a model using invoices and receipt tickets, but I encountered a thermal receipt that does not fit onto a single page. If scanned, it would span across three pages. How can I ensure that the Document Understanding system recognizes it as a single ticket; meaning it interprets the header data as belonging to the second and third pages, the footer data as belonging to the first and second pages, and ideally avoids repeating lines if the operator scanning the document did not carefully exclude already scanned information on the previous page? Any advice or suggestions for tackling this situation?

Tanks!

Hi @Matias_Clemente.Arg,

Check this :

  • Process multi-page PDFs as one document

  • Train models with multi-page receipts

  • Define document-level fields (header/total) and table fields (line items across pages)

1 Like

Hi @Matias_Clemente.Arg

While training, consistency of the data labelling is important, you can consider below points:

For extracting header information, if you expect it to be on multiple pages and you want to consider from page 1, ensure to label header information from page 1 throughout all the document. (this applies for repeating information as well - excluding table)

For repeating tables, it needs to be taken care in post processing (after extraction)

Include variations of documents (multiple format, multiple paged docs)

1 Like

Hi @Matias_Clemente.Arg

I believe you are using a Custom ML Extractor. If not, you’ll need to create one.

  • Train it with multi-page samples where a single receipt spans across pages.
  • During labeling, make sure all pages are tagged as one document, so header/footer and line items are learned correctly.
  • For overlapping scans, you may need a small post-processing step to remove duplicate lines.

This approach should help the model understand it as a single ticket.

Thank you!

1 Like

Hello @iampnkz @Maheep_Tiwari @Rohit_More, thank you for your responses, and I extend my apologies for the delay in replying. Various matters diverted my focus from development.

Hi @Matias_Clemente.Arg,

No worries about the delay. Glad to help! If this worked for you, please consider marking it as a solution.

Thanks!

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.