HI Team,
i am trying to extract Readable Data in a PDF which is in table format, but after extracting the data is not coming in a table format.
i am sure Activity Table Extraction will not work here, can you pls suggest how can i achieve this.
HI Team,
i am trying to extract Readable Data in a PDF which is in table format, but after extracting the data is not coming in a table format.
i am sure Activity Table Extraction will not work here, can you pls suggest how can i achieve this.
I would suggest to use document understanding for this. It would be more reliable and accurate.
Thanks,
Ashok
Hi @devasaiprasad_K ,
Its definetly suggested to go with DU concept as the doc type u are trying to extract is semi structurrd . If its a form where u have predefined filelds and no change of position u c can use regex but i believe the tdble will be dynamic
Cheers
Hie @devasaiprasad_K for that you can use Document Understanding and also use form extractor with the help of document understanding you can extract your data in a table format and it easy and also very fast. way to extract table format data…
cheers Happy Automation…
I have done this by using python in the past.
from tabula import read_pdf
import pandas as pd
import os
# Path to your PDF file
pdf_path = 'INPUT'
# Define your static output folder
output_folder = 'OUTPUTFOLDER'
# Check if the output folder exists, create it if it doesn't
if not os.path.exists(output_folder):
os.makedirs(output_folder)
# Use read_pdf function with pages="all" to extract tables from all pages
tables = read_pdf(pdf_path, pages="all", multiple_tables=True)
# Iterate over each table (DataFrame) and process or save it
for i, table in enumerate(tables):
# Define the path for each output CSV file within the output folder
output_path = os.path.join(output_folder, f"FILENAME_{i+1}.csv")
# Save each table to a CSV file in the specified output folder
table.to_csv(output_path, index=False)from tabula import read_pdf
import pandas as pd
import os
# Path to your PDF file
pdf_path = 'INPUT'
# Define your static output folder
output_folder = 'OUTPUTFOLDER'
# Check if the output folder exists, create it if it doesn't
if not os.path.exists(output_folder):
os.makedirs(output_folder)
# Use read_pdf function with pages="all" to extract tables from all pages
tables = read_pdf(pdf_path, pages="all", multiple_tables=True)
# Iterate over each table (DataFrame) and process or save it
for i, table in enumerate(tables):
# Define the path for each output CSV file within the output folder
output_path = os.path.join(output_folder, f"FILENAME_{i+1}.csv")
# Save each table to a CSV file in the specified output folder
table.to_csv(output_path, index=False)
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.