I want to extract table from scanned pdf and write into excel file i want an python/java code extracting the table
Hello @Melbin_Antu!
It seems that you have trouble getting an answer to your question in the first 24 hours.
Let us give you a few hints and helpful links.
First, make sure you browsed through our Forum FAQ Beginner’s Guide. It will teach you what should be included in your topic.
You can check out some of our resources directly, see below:
-
Always search first. It is the best way to quickly find your answer. Check out the
icon for that.
Clicking the options button will let you set more specific topic search filters, i.e. only the ones with a solution. -
Topic that contains most common solutions with example project files can be found here.
-
Read our official documentation where you can find a lot of information and instructions about each of our products:
-
Watch the videos on our official YouTube channel for more visual tutorials.
Hopefully this will let you easily find the solution/information you need. Once you have it, we would be happy if you could share your findings here and mark it as a solution. This will help other users find it in the future.
Thank you for helping us build our UiPath Community!
Cheers from your friendly
Forum_Staff
@Melbin_Antu ,
You can use the “tabula” package in python and you can read the table from the pdf. tabula package only work for native pdf. So if you have OCR pdf then read through OCR python and pass to tabula it will give you the entire table. condition is that table should be structured.
Thanks, you need more assistance you can contact.
import pdfplumber
import pandas as pd
Input and output file paths
input_pdf_path = “C:\Users\hp\OneDrive\Desktop\testdata\BHARATHIYAR LORRY OFFICE-712-CFA.pdf”
output_excel_path = “C:\Users\hp\OneDrive\Desktop\testdata\extracted_data.xlsx”
Function to extract tables from PDF
def extract_tables_from_pdf(pdf_path):
tables =
with pdfplumber.open(pdf_path) as pdf:
for page in pdf.pages:
table = page.extract_table()
if table:
tables.append(table)
return tables
Extract tables from PDF
tables = extract_tables_from_pdf(input_pdf_path)
Write tables to Excel
with pd.ExcelWriter(output_excel_path) as writer:
# Create an empty DataFrame to create at least one sheet
pd.DataFrame().to_excel(writer, index=False)
# Write tables to subsequent sheets
for i, table in enumerate(tables):
if table:
df = pd.DataFrame(table[1:], columns=table[0])
df.to_excel(writer, sheet_name=f"Table_{i+1}", index=False)
print(“Tables extracted and written to Excel successfully.”)
I used this code and it cant extract the table