Multipage PDF Data Extraction


I am facing one issue while extracting data from PDF some rows are split that is why my data output is shrinking. Pease find the example

Please see the value of fifth column here table row is split.

The output data value.

2026.8 0 should be in quantity column and accordingly.

If any body is any guideline please tell me. We have more that 500 PDFs with different pattern but Table columns are same.

Note: PDF Data table value is varying it could be 5 pages or 10 pages not fixed. For example bank transaction statement.

Strategy: I am converting that PDFs in word document format and searching table here. And using Data scraping I am getting the data is structured tabular format.


If I could get some pdf files, I can try this scenario. Do you have any sample file(not company files)?.



Thanks for support but this is company sensitive data so unable to share the PDFs.
Here line is break so it is considering one row .


I completely understand that.

Hello Anand,
In this video, I have 17 use-cases for extracting tables from PDF and write data in Excel and I have also samples with multiple pages:

45:50 File 10 PDF with multiple columns that have multiple lines + multiple pages
1:17:10 File 19 PDF with multiple pages and columns with multiple lines


Cristian Negulescu