@Tom1989
I will give you the another solution , this read pdf with ocr won’t work on your pdf, but before that I have few questions like
how many pdf you want to process per day?
will this be the standard format of your pdf or will it change ?
do you want all the information from the pdf?
is it possible to make few changes in this pdf if it is the standard pdf?
Our client is a logistic company and I believe they receive thousands of such invoices everyday.
I am not sure whether or not it will be a standard format. I would request you to share your insights on approaching the problem from both perspectives.
Yes, I reckon. Following are the requirements of my client:
Hi @Tom1989
Please find the attached text file , this is the data i fetched to my best from your pdf using python . if you are familiar with coding part then i can explain more.Basically i write python scripts that will run and fetch the data from pdf and give it in text format and excel format . the only thing i am unable to fetch is the total due and the table header because there background color is gray . any how i need some more time i will fetch those data also . Kindly have a look of txt file and let me know is it okay with you or not.
If you look at the output of Microsoft OCR for the entire file , you will notice that the robot is able to capture all the details and present it in a different format to that of the pdf.
And I receive following output when I try to extract this specific parameter using the above expression:
From the first image it is noticeable that the OCR engine is working fine. I just need to make changes to my expression to capture the correct parameter.
Hi @Tom1989
One thing i want to add here your output is not coming as required from microsoft ocr like last rate value is coming like EKS 488.90 , due date ,invoice date and to address all are incorrect and what you will do if total due and rate both are equal suppose HKD 93.50