kalv
(kalv)
December 28, 2020, 10:54am
1
Hi All,
I have to extract table in pdf (image based). Table appears from second page till multiple pages. Table rows count is dynamic.
Hello Kalv,
In this video, I have 17 use-cases for extracting tables from PDF and write data in Excel and I have also samples with multiple pages:
45:50 File 10 PDF with multiple columns that have multiple lines + multiple pages
1:17:10 File 19 PDF with multiple pages and columns with multiple lines
Code:
'FILE1
Dim strtmp As String
strtmp = strin.Substring(strin.IndexOf("Number"), strin.IndexOf("Subtotal") - strin.IndexOf("Number")).Trim
strout = strtmp.Replace(" ", "|")
strtmp = strin.Substring(strin.IndexOf("Subtotal") + 8)
strpar = strtmp.Substring(0, strtmp.IndexOf(Environment.NewLine)).Trim
'FILE2
Dim strtmp As String
Dim strout As String
strout = "Col1|Col2|Col3|Col4"
strtmp = strin.Substring(strin.IndexOf("Vacancies") + 11).Trim
For Each line As String In strtmp.Split(New String() {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
If (line.Length > 3) Then
If (IsNumeric(line(0))) And (line(1) = " ") And (line(2) = " ") Then
strout = strout + Environment.NewLine + line.Replace(" ", "").Replace(" ", "|").Trim
ElseIf (line(0) = "") And (line(1) = " ") And (line(2) = " ") Then
strout = strout + line.Replace(" ", "$").Trim()
This file has been truncated. show original
Thanks,
Cristian Negulescu