Extracting data of Multiple Receipts on one Page

Hi guys,

I’ve document which contains multiple scanned receipts in one page. How do I go about splitting the one pager document into however many receipts there are on that page so that I could extract the data of all receipts. I could easily extract the data if there’s only one receipt per page using the pre-trained DU models and intelligent keyword classifier. But when I get multiple receipts on one page, I could only extract data from one receipt on that page.

The documents isn’t standardized so every document I receive may contain 1 or more scanned receipts. I need to extract the merchant name, date and amount from all of the scanned receipts on the page.

Example:

Hey @ctl,

Use Intelligent Form Extractor or Regex Extractor with multiple regex rules to capture merchant name, date, and amount. Preprocess the scanned page by splitting each receipt region using OCR with anchors (e.g., logos or keywords like “Total”). Loop through extracted segments and pass each as a separate document into the DU pipeline.

Hi @ctl,

Try using IXP for this. Since IXP supports table based extraction, you can write the prompt in a way that instructs the model to map each receipt on the page to a separate row in the table.

Mark it as solution if it helps.