Hi. I have a scenario where I need to read different invoice. These invoice are scanned and are available in a folder. The invoice structure will be different for different invoice but I need to capture similar data like invoice number, date, amount, etc. How can I achieve this. If it can be explained as a step would be appreciated.
You can use read PDF text with OCR activity, and see if the output is readable and because the invoices are scanned the data won’t be proper.
If the data is proper you can use regular expressions to extract the data.
If the data is not proper, use computer vision activities.
Hi @anil5 ,
As you said the data is not proper and I have no clue on how to use Computer vision. Can you just give me a short description how I can use it.
Go through the provided link and install the activities.
After installation , use CV screen scope first indicate on screen from where you want to extract the values.
Inside screen scope, use CV get text to extract the values and using CV activities the data extracted is accurate.
To know more about CV activities, go through the activities guide for computer vision.
Ok. Thanks mate.
I tried using the CV automation and seems to be good. But the issue i am facing how can I read data from scanned invoice for which the formats are not same. Also is there any way where we can read without opening the pdf file.
The accuracy is not 100% while reading pdf with microsoft OCR. How can i make it more accurate.
If you’re dealing with image try to convert the PDF scanned docs to image(s) and use API Integration OCR
You can try using Microsoft Azure Computer Vision or Google Vision.
For Microsoft use the Microsoft Vision Activities > Handwritten Text (Mode as Printed)
Microsoft Vision requires the following:
- Service URL “https://[region].api.cognitive.microsoft.com”
- Subscription Key
Hope it helps
am a learner ,can you please attach the work flow how to get the invoices from different images