Split PDF with respect to tables present in it

Hi All,

I have a pdf document which consist of multiple tables.
Each table can span more than one page on the document.

The tables are unstructured, and the goal is not to extract them.

I want to split the original document w.r.t tables and create separate pdf docs each consisting of only a single table.

Please suggest me approaches to achieve this.

@Ayush_Raj

Are the table separators constant?

can you send a sample here

cheers

The end of each table would look somewhat like this:

image

The last row of each table would have a common cell. So, I think this could act as a constant table separator.

@Ayush_Raj

Then try to find the separtor page number using loop and split the document till that page and continue the loop same way

cheers

Try to use Intelligent Keyword classifier activity

Regards,
Dilip Wakdikar

I cannot use document understanding framework.
I am working on approaches without using that.

This seems feasible.
Are there any helpful packages available for the same?

Thanks

@Ayush_Raj

pdf package has read pdf which has page number option to read each page separately in loop

and then there is a extract pdf range where you cans pecify range of page eg: 1-5 or 4-6

cheers

1 Like

I tried this approach, but this doesn’t seem to be working.
It splits the pdf w.r.t. pages perfectly.

But I need the split to be according to the tables present on the pdf document.
When split w.r.t to page using a constant last field present in each table as identifier, the new pdf consists of extra part of next table present below the current table.

Is there any way I could split the pdf just with each single table?

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

@Ayush_Raj

If there is a prt of the table present on a new page and from same if new table starts…there is no direct way to do it…because even manually i dont think you can do it…unless you convert pdf to word and then split and then convert back to pdf again

Cheers

1 Like

Yes, this was the conclusion I think I can get.

Thank You

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.