Ayush_Raj
(Ayush Raj)
January 3, 2024, 2:50pm
1
Hi All,
I have a pdf document which consist of multiple tables.
Each table can span more than one page on the document.
The tables are unstructured, and the goal is not to extract them.
I want to split the original document w.r.t tables and create separate pdf docs each consisting of only a single table.
Please suggest me approaches to achieve this.
Anil_G
(Anil Gorthi)
January 3, 2024, 3:02pm
2
@Ayush_Raj
Are the table separators constant?
can you send a sample here
cheers
Ayush_Raj
(Ayush Raj)
January 4, 2024, 3:00pm
3
The end of each table would look somewhat like this:
The last row of each table would have a common cell. So, I think this could act as a constant table separator.
Anil_G
(Anil Gorthi)
January 4, 2024, 3:09pm
4
@Ayush_Raj
Then try to find the separtor page number using loop and split the document till that page and continue the loop same way
cheers
Try to use Intelligent Keyword classifier activity
Regards,
Dilip Wakdikar
Ayush_Raj
(Ayush Raj)
January 4, 2024, 3:24pm
6
I cannot use document understanding framework.
I am working on approaches without using that.
Ayush_Raj
(Ayush Raj)
January 4, 2024, 3:27pm
7
This seems feasible.
Are there any helpful packages available for the same?
Thanks
Anil_G
(Anil Gorthi)
January 4, 2024, 3:29pm
8
@Ayush_Raj
pdf package has read pdf which has page number option to read each page separately in loop
and then there is a extract pdf range where you cans pecify range of page eg: 1-5 or 4-6
cheers
1 Like
Ayush_Raj
(Ayush Raj)
January 4, 2024, 3:31pm
9
I tried this approach, but this doesn’t seem to be working.
It splits the pdf w.r.t. pages perfectly.
But I need the split to be according to the tables present on the pdf document.
When split w.r.t to page using a constant last field present in each table as identifier, the new pdf consists of extra part of next table present below the current table.
Is there any way I could split the pdf just with each single table?
1 Like
system
(system)
Closed
January 10, 2024, 9:03am
10
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.
Anil_G
(Anil Gorthi)
January 25, 2024, 6:59pm
12
@Ayush_Raj
If there is a prt of the table present on a new page and from same if new table starts…there is no direct way to do it…because even manually i dont think you can do it…unless you convert pdf to word and then split and then convert back to pdf again
Cheers
1 Like
Ayush_Raj
(Ayush Raj)
January 27, 2024, 9:52am
13
Yes, this was the conclusion I think I can get.
Thank You
system
(system)
Closed
January 30, 2024, 9:52am
14
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.