I am attempting to train a model using invoices and receipt tickets, but I encountered a thermal receipt that does not fit onto a single page. If scanned, it would span across three pages. How can I ensure that the Document Understanding system recognizes it as a single ticket; meaning it interprets the header data as belonging to the second and third pages, the footer data as belonging to the first and second pages, and ideally avoids repeating lines if the operator scanning the document did not carefully exclude already scanned information on the previous page? Any advice or suggestions for tackling this situation?
While training, consistency of the data labelling is important, you can consider below points:
For extracting header information, if you expect it to be on multiple pages and you want to consider from page 1, ensure to label header information from page 1 throughout all the document. (this applies for repeating information as well - excluding table)
For repeating tables, it needs to be taken care in post processing (after extraction)
Include variations of documents (multiple format, multiple paged docs)
Hello @iampnkz@Maheep_Tiwari@Rohit_More, thank you for your responses, and I extend my apologies for the delay in replying. Various matters diverted my focus from development.