Document Understanding - Can it extract data from documents it has not seen before?

rpa_jay · October 29, 2021, 8:29am

Hi everyone,

I haven’t used Document Understanding before and am working through the RPA academy courses at the moment.

The scenario is that we have thousands of invoices which are semi-structured but do not have a consistent layout. We would like to be able to extract the vendor name and total amount from each invoice.

The RPA academy is guiding me through the process of extracting the information from seemingly pre-defined document layouts, but my question is whether the robot can engage machine learning or AI to extract the correct information from invoice layouts it has not seen before?

Thanks for any help

jasonbateman1991 · October 29, 2021, 8:58am

Are these Word documents ? , if so and there is field labels , you extract the entire document to a string and then split by the field name or what ever precedes the data you need to grab.

If the documents do vary drastically, there is a AI company we use called Automation Hero which we feed invoices from several different companies ( all differing in layout ) and they use a Machine Learning model to extract particular information from the documents and send it back in what ever form we require , we use CSV

regards

rpa_jay · October 29, 2021, 3:00pm

Hey Jason,

These aren’t Word documents unfortunately. They’re all generated PDF documents (nothing scanned). The design varies between vendors (and there are many vendors) so I’m not sure we can rely on something preceding the information I’m looking for, nor on hard-coded labels. There has to be some kind of intelligence to this data extraction.

I’ll be looking into Automation Hero today, along with some other third-party solutions. Thanks for the suggestion.

Regards

Topic		Replies	Views
Facing issues while i am using document understanding to extract data from different invoices having different structure Something Else feedback	3	744	June 24, 2022
Few Questions About Document Understanding Document Understanding orchestrator , activities , studio , question	3	1149	June 6, 2024
Extract the data from pdf AI Center question , ai_center	2	1770	July 20, 2021
Invoice data extraction using document undertading Document Understanding studio , question , document_understanding , data-extraction , invoices	4	1013	June 16, 2023
Document understanding ML Something Else question	2	767	September 19, 2022

Document Understanding - Can it extract data from documents it has not seen before?

Related topics