REGEX FOR TABLE EXTRACTION FROM PDF

Hi Team,

I am facing difficulty of extracting the table from pdf using regex only.

since there is only dependency of using regex for extracting the table from pdf. Can anyone help with any handy regex which will help for extracting any table from pdf.

Regards,
Ritesh

Hi @Ritesh_Burman

So we can help you. We need more information. What data do you want to extract from the table? Could you share a screenshot or make it clearer what you would like to extract with Regex?

There is no handy regex. You define them depending on your needs and string patterns.

As guideline, I can give you a couple of ideas. Assuming your string includes brake lines you may:

  • Build a regex according to the pattern of the rows to extract from the table
  • Search for beginning and end of table (if you have specific keywords to do so) and then parse line by line what is in the middle

I think these are two approaches when not using DU

Hope it helps