How to Extract/Manipulate a PDF Table with no borders

So I have a pdf file that is kinda similar to this
TestPDF.pdf (55.6 KB)
What I need to do is to get the table(Col1 to Col5) and save it to an excel file.
I’ve read reading the pdf txt file and it looked like this

Report Document
Doc No: 11084-02  ARPA-808 1
Subject No: 42 Date: 05/06/2023 Submit Date: 06/12/2023

Col1 Col2 Col3 Col4 Col5
A X   1Line1
1Line2
B  1,245.67  2Line1
C  46.67 567.80
D    4Line1
E X
Total 1,245.67 567.80

As you can see “1Line2” ended up in the next the row but it should have been with 1line1.
In my actual pdf the number of spaces between the extracted text per line, is not the same with the number of columns, so splitting them by space and manually adding each element in a datarow won’t work, since each array row doesn’t have the same size(would end up having index of out bound error)

Is there anyway to arrange this table properly?BTW, Total in the end can be disregarded :slight_smile:

@Archie

Did you try read pdf with preserve format checked?

generally when you read like that then the fixed width can be used…but again that differs you need toc heck different types and get with a optimal width that is better for you

cheers

cheers

I tried but it just ended up with more spaces in between. Not sure on how can I accurately divide this.

                                                                        Report Document
Doc No: 11084-02                                                                                                         ARPA-808 1
Subject No: 42                                              Date: 05/06/2023                                             Submit Date: 06/12/2023

Col1                                Col2                                Col3                                 Col4                                Col5
A                                   X                                                                                                            1Line1
                                                                                                                                                 1Line2
B                                                                       1,245.67                                                                 2Line1
C                                                                       46.67                                567.80
D                                                                                                                                                4Line1
E                                   X
                                    Total                               1,245.67                             567.80

@Archie

This is expected…save the data in text file…see the character count for eqch line with data and without…

Then you will know what are the minimum and maximum characters for each column …

Cheers

Correct me if I’m wrong.
I’ve tried splitting the txt with new line as delimiter and count the character length each line(row), they are not the same size btw. Not sure what to do with this.

@Archie

They would not be same unless all the columns are filled

so it goes like this

say you have one row with 3 column filled and another with 4 and another with 3 again

3 and 4 will definitely bedifferent…3 and 3 depends on the character size

so how it goes is…each column will have a minimum character count when empty may be 10 spaces…and maximum character count when fully filled may be 30…now there eill be a separation character count…you have to count each and then arrive at numbers and use them to identify if the column has data or not

sometimes if lucky the spaces between the columns that is between the data there will be specific number of spaces…which can be counter to separate and understand

we can use generate datatable and fixed width

cheers

Thanks, I’ll try this one. I’ll get first the max character length for each column.

1 Like