Where to download real world invoices for practice?

Hello! I want to practice document understanding. Anybody knows where I can download a large number(50+) of Invoices (POs would be better) to do a “dummy” project on my own? Thanks

Hi @Ricardo_Aparicio,

I do not have a concrete answer for this. Much like any other fields of study, to get real life data is next to impossible. Especially for invoices and purchase orders merely due to company policies.

All intelligent document parsing suppliers (ML based) rely on your data having the three "V"s
Velocity : Rate at which your data / new data is ingested (so that the model can be retrained)
Variety : The variety in your data, such that the algorithm learns more patterns and unseen data during the learning phase, so as to avoid errors in production
Volume : The model can make some statistical significant metrics and to plan scaling your data pipeline.

Sadly, the only place you get the above three is in a real-life project.

You can try to build your own synthetic data by using open source tools to create invoices, for example : GitHub - Invoiced/invoice-generator-api: A free API for generating invoice PDFs and e-Invoices.

But remember that your model is then not realistic as it does not satisfy the “Variety” requirement. It may or may not know what it is parsing the next time a new invoice is given to it. It will try, but with low success rate.

Please do post how you went about this issue and if you find some sources, please do post them in this thread. I am sure there are many more who are facing the same challenge.

Goodluck!