Need help reading mulitple tables from pdf files using OCR

Hi,

I’ve seen lots of posts regarding this subject but I haven’t found a solution yet.

Here is the proces i’m trying to automate:

  1. A printed letter is received, containing a letter and some datatables (tables are on page 2 and up).
  2. Each page has a bit of text followed by a table
  3. The letters are scanned to a folder in order to extract the datatabels using OCR.
  4. Next the tables need to be extracted and saved in a excel file.

I have succesfully extracted data from an example table without any text above it, however when there is text above the table it messes up the columns. (They all have some data above the tables, so this is a problem)

My question:
What would be the most reliable way to read these scanned documents and extract the tables only. Do i need to use Flexxicapture? (And purchase a license) Are there ways to do this without sending data to a cloud service?

I can’t share an example because its filled with sensitive data, but I have added some anonimized screenshots.



All I’m trying to do is digitize the datatables.
Hopefully someone can point me in the right direction.

Thanks in advance!