Extract the data from pdf

Is there a way to extract invoice information like Invoice No, PO No, Address, Amount etc… from multiple vendors with multiple invoices patterns. here data is not standard & data positions also vary from one invoice to another.

Which is the best way to achieve this problem?

  1. Regx
  2. AI & Machine Learning

If anyone goes through this kind of real-world scenarios, please advice.

1 Like


Being witu unstable position I would suggest to go with both of them that is
—Ai ML
— Regex
as a combined one and it’s possible with document understanding

So pls go ahead with document understanding
For more details

Cheers @himanshur

Hi @himanshur,

Answer to your question:
If the number of invoice types can be counted. Assume 10 different, then Regex is better suited with some if conditions to support which Regex expression to use.

If the number of invoice types are many and the volume of invoices are also large (more than 100) then I would advice to go with a tool which helps in extracting values based on machine vision.

My suggestion:
Firstly, I suggest you try out Rossum. They are the leaders in this space (https://rossum.ai/). They are the ones every PDF extracter wants to beat currently.
Second, I would try AI Builder from Microsoft as well.

UiPath was quite late in implementing Document Understanding and are still catching up in this space, but the way they integrate with RPA robots makes it interesting. There is a course on UiPath academy and ample number of YouTube videos on how to get started with UiPath’s Document Understanding. I myself am new to this offering by UiPath.

Nonetheless every intelligent document parser today have very good API documentation, which you can use to build custom integrations in your UiPath Robot.

Hope this helps you brainstrom more.