Extraction of Tables in UiPath Studio using UiAutomation(Extract Tables) shows Variation in Data

Ananta_Sunita · May 29, 2024, 6:56am

Hi Team,

I am trying to extract data from tables in PDF files using Extract Tables and Selectors in UiAutomation.However, the Test Selection shows proper data but final result there are missing values, Word spell errors and the position of words changes.

Overall, the Extraction is inconsistent when we process multiple files in a folder.

The tables are somewhat complex with variations.

Help me resolve this.

Thanks,
Sunita

AJ_Ask · May 31, 2024, 3:52am

Hi @Ananta_Sunita

Try using CV activities for table extraction.

You may get better results.

Hope this helps

sarvesh.b · June 11, 2024, 11:01am

Hi @Ananta_Sunita `

Try this python script to extract pdf values, Hope it works let me know.

import pandas as pd
from pdfminer.high_level import extract_text
import re

def extract_table_from_pdf(pdf_path):
# Extract raw text from PDF
raw_text = extract_text(pdf_path)

# Use regex to find table patterns (customize based on your table structure)
table_pattern = re.compile(r"your_table_regex_pattern")
tables = table_pattern.findall(raw_text)

# Parse and clean table data
table_data = []
for table in tables:
    rows = table.split('\n')
    for row in rows:
        cells = row.split()  # or use a more sophisticated splitter
        table_data.append(cells)

# Convert to DataFrame for easier manipulation
df = pd.DataFrame(table_data)

# Perform data cleaning and validation
# Example: drop empty columns, handle missing values, etc.
df.dropna(how='all', axis=1, inplace=True)
df.fillna('N/A', inplace=True)

return df

Example usage

pdf_path = ‘path/to/your/pdf_file.pdf’
df = extract_table_from_pdf(pdf_path)
print(df)

Use the Invoke Python Method activity in UiAutomation to call your Python script.
Pass the PDF Path as an argument to the Python script.
Retrieve the DataFrame and process it further within UiAutomation.

Thanks

Topic		Replies	Views
Table extraction enhancement- Latest release 2022.10 Video Tutorials studio , faq	0	1007	November 22, 2022
Extract tables from pdf which varies in all pdfs Studio datatable , uiautomation , studio , question , tools	2	912	March 11, 2022
I am facing issues with table data extraction. I need a guidance how can I extract all table data from pdfs using uipath studio using Document Understanding. Tables are not in same format, and tables can be present in anywhere of a pdf page Studio studio , question , activities_panel	2	967	November 25, 2021
Get value from table in a website Studio uiautomation	5	623	February 18, 2023
Data table extraction by pdf Robot robot , question	1	901	November 21, 2022

Extraction of Tables in UiPath Studio using UiAutomation(Extract Tables) shows Variation in Data

Example usage

Related topics