How to Extract table from non-native pdf?

I have pdf file that that 100+ pages, now I want to find where is table in pdf and extract that table and save in Excel.

Can someone give me idea? how can i find table is present in which page


Two ways

  1. Use read pdf and get the data of pdf and then search for table headers in it…and then try to use regex to get the data under it
  2. Use document understandingn and train a model


Headers may differ in every pdf


Then even du will not hep unless you train each type of pdf…if you have what all types of tables can cone…then need to train all those different models in du and use them

See if it is a tagged pdf…then not a good way but we can try to use frontend as well to get the table


Hi @RobotUi

Hi @RobotUi ,

I do have pdf that has table as image. I need to extract table from image.