Need help on extraction of tabular data from pdf

Hi,

I want to extract tabular data from a PDF. Please help with the best solution.

@Bhaskar_Varma - Are you trying this in Studio or StudioX?

Studio

Im a beginner and using the community version.

@Bhaskar_Varma - Here are the options…

  1. Read PDF to Text and Then use Regex to extract the data
  2. CV Scope → CV Extract Table
  3. Document Understanding

Hello Bhaskar,
In this video, I have 17 use-cases for extracting tables from PDF and write data in Excel:

2:00 GitHub free code for all the files
2:20 Logic of general workflow
4:40 File 1 simple PDF
9:50 File 2 PDF with a column with multiple lines
20:10 File 3 PDF with a column with multiple words ON the LAST column
27:00 File 5 PDF with a column with multiple words ON inside column (2 columns)
31:40 File 6 PDF with a column with multiple lines
39:10 File 8 simple PDF
42:15 File 9 PDF with multiple spaces on that need to be correct
45:50 File 10 PDF with multiple columns that have multiple lines + multiple pages
55:50 File 11 simple PDF with protection empty Cells
58:35 File 12 Big PDF with an empty line and Empty columns and partial total
1:02:25 File 13 PDF with multiple columns that have multiple words and hard to define a rule
1:10:15 File 15 PDF with multiple columns that have multiple lines
1:12:50 File 17 simple PDF remove spaces from headers also remove space from Data
1:16:05 File 18 simple PDF
1:17:10 File 19 PDF with multiple pages and columns with multiple lines
1:22:10 File 20 PDF with multiple columns that have multiple lines
1:25:00 File 21 PDF with empty columns and subtotal

Code:

Thanks,
Cristian Negulescu

1 Like