Document Understanding - Can it extract data from documents it has not seen before?

Hi everyone,

I haven’t used Document Understanding before and am working through the RPA academy courses at the moment.

The scenario is that we have thousands of invoices which are semi-structured but do not have a consistent layout. We would like to be able to extract the vendor name and total amount from each invoice.

The RPA academy is guiding me through the process of extracting the information from seemingly pre-defined document layouts, but my question is whether the robot can engage machine learning or AI to extract the correct information from invoice layouts it has not seen before?

Thanks for any help

Are these Word documents ? , if so and there is field labels , you extract the entire document to a string and then split by the field name or what ever precedes the data you need to grab.

If the documents do vary drastically, there is a AI company we use called Automation Hero which we feed invoices from several different companies ( all differing in layout ) and they use a Machine Learning model to extract particular information from the documents and send it back in what ever form we require , we use CSV


Hey Jason,

These aren’t Word documents unfortunately. They’re all generated PDF documents (nothing scanned). The design varies between vendors (and there are many vendors) so I’m not sure we can rely on something preceding the information I’m looking for, nor on hard-coded labels. There has to be some kind of intelligence to this data extraction.

I’ll be looking into Automation Hero today, along with some other third-party solutions. Thanks for the suggestion.