Extraction of Tables in UiPath Studio using UiAutomation(Extract Tables) shows Variation in Data

Hi Team,

I am trying to extract data from tables in PDF files using Extract Tables and Selectors in UiAutomation.However, the Test Selection shows proper data but final result there are missing values, Word spell errors and the position of words changes.

Overall, the Extraction is inconsistent when we process multiple files in a folder.

The tables are somewhat complex with variations.

Help me resolve this.

Thanks,
Sunita

Hi @Ananta_Sunita

Try using CV activities for table extraction.

image

You may get better results.

Hope this helps :slight_smile:

Hi @Ananta_Sunita `

Try this python script to extract pdf values, Hope it works let me know.

import pandas as pd
from pdfminer.high_level import extract_text
import re

def extract_table_from_pdf(pdf_path):
# Extract raw text from PDF
raw_text = extract_text(pdf_path)

# Use regex to find table patterns (customize based on your table structure)
table_pattern = re.compile(r"your_table_regex_pattern")
tables = table_pattern.findall(raw_text)

# Parse and clean table data
table_data = []
for table in tables:
    rows = table.split('\n')
    for row in rows:
        cells = row.split()  # or use a more sophisticated splitter
        table_data.append(cells)

# Convert to DataFrame for easier manipulation
df = pd.DataFrame(table_data)

# Perform data cleaning and validation
# Example: drop empty columns, handle missing values, etc.
df.dropna(how='all', axis=1, inplace=True)
df.fillna('N/A', inplace=True)

return df

Example usage

pdf_path = ‘path/to/your/pdf_file.pdf’
df = extract_table_from_pdf(pdf_path)
print(df)

  • Use the Invoke Python Method activity in UiAutomation to call your Python script.
  • Pass the PDF Path as an argument to the Python script.
  • Retrieve the DataFrame and process it further within UiAutomation.

Thanks