Extract Unstructured table data from pdf

The invoice pdf consists of an unstructured table format while using document understanding also the bot extract below next value.

I request you to help me with this solution. Advanced thanks.

Are those scanned documents/pdfs where you need to use document understanding or are they machine readable?

If you have to use document understanding, I think the only way to do this is to train your model on those kind of tables.

@T0Bi , Thanks for the reply. That brought a new idea to think about to complete the process.

Hi there, I invite you to try Nanonets Cognitive OCR (Our UiPath connector: NanoNets OCR - RPA Component | UiPath Marketplace) We use AI specifically to handle various unstructured table data. You can directly process invoices without writing templates/rules.

Hi Tobi,

I am using enterprise trail for the document understanding, do you have any demo video to train the document on ML skill.?

I don’ have videos but I think the Data Manager is fairly straight forward.
You can use the Data Manager to lable your data and use it for training.

Here’s the documentation: Document Manager

If you go through the steps (Access DM, Configure DM, etc.) one by one it shouldn’t be hard.

If you have any further questions, hit me up.


Hi Tobi,

Thanks for the reply.
I have gone DM with video from Youtube on that I had a issue or unable to find like selecting document understanging with Data manager from the drop down.
learned from this link and issue on attachment.

Complete Tutorial On Creating Data Labels in UiPath AI Center - YouTube