I am trying to build a robot that opens unstructured PDF invoices and gets specific values like the date of the invoice. I use google OCR to convert the image PDF to text which works fine but then I am stuck.
The value I want to extract always starts with the term “(12)” and right from this term the date is located. Sometimes there is also another word inbetween like in this example:
How can I extract the date?
The position of the date varies from invoice to invoice, so I can not use the location to extract the value.
Many thanks in advance.