How to extract multiple totals per pdf page

j_run · May 17, 2021, 11:08pm

I would like to know how to extract multiple subtotals per page from an PDF invoice. There is only one machine learning extractor for ‘total’ and the other extractors that are for numeric amounts are not picking the other totals. I’ve tried using the ultility bills endpoint but the other numeric options for utility don’t pick up the additional totals either.

Bobby_J · May 18, 2021, 12:39am

Not that I’ll be able to solve the problem, but what kind of data are we working with here?

scanned image of a print/fax
fully digital file, such as “print to pdf” (text can be natively selected with a cursor)
fully standard form (has structured/tagged input fields)

Additionally do all the invoices look the same, or can they vary by page or by vendor?

The answer to your question will vary significantly - ideally you have at least a digital file, and have a standardized format. If so, you may be able to read the .pdf and parse via regex or other means. If not, you will need more complex solutions.

prasath17 · May 18, 2021, 1:01am

@j_run - You can try the Regex based extractor…

j_run · May 18, 2021, 2:45pm

I did try using Regex at first but was informed that the focus has to be on machine learning extractors to account for the invoices coming from different vendors in the future

j_run · May 18, 2021, 2:48pm

The PDFs are scanned images of a printed utility bill. My understanding is that the format is not standardized as the assumption is there will be invoices coming from other vendors.

Topic		Replies	Views
Need help in PDF extraction using Document Understanding Document Understanding pdf , activities , studio , question , document_understanding , pdf-extraction	4	741	November 21, 2022
How to get information on each page of PDF Activities pdf , studio , question	7	1031	December 10, 2020
Only tables extraction from scanned pdf Activities ocr , table	3	490	March 22, 2023
How to read multiple invoices from 1 pdf file? Document Understanding	11	2280	October 13, 2023
Extract multiple items details from invoice Something Else feedback , document_understanding	7	1266	August 28, 2022

Most Active Users - Yesterday
ashokkarale
Anil_G
Yoichi
yangyq10
postwick
chandreshsinh.jadeja
aravindbalineni123
Parvathy
aya
PRASHANT_GABHANE
More details...

How to extract multiple totals per pdf page

Related Topics