Hi All,
data scraping is not reading full table details if it data is in multiple pages of PDF. How can I use starting name and ending name for reference to extract the table details from PDF. Please suggest.
Regards
Niranjan
Hi All,
data scraping is not reading full table details if it data is in multiple pages of PDF. How can I use starting name and ending name for reference to extract the table details from PDF. Please suggest.
Regards
Niranjan
Try this workflow:
Assign pageText = Read PDF Text activity (output: pdfText)
Assign startKeyword = "Start of Table"
Assign endKeyword = "End of Table"
Assign startIndex = pdfText.IndexOf(startKeyword)
Assign endIndex = pdfText.IndexOf(endKeyword)
Assign tableData = pdfText.Substring(startIndex, endIndex - startIndex)
Build DataTable activity (output: extractedTable)
Add Data Column activities for each column in the table
Assign rows = tableData.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
For Each row In rows
Assign columns = row.Split(","c) // Assuming the data is comma-separated
Add Data Row activity (Array: columns) to extractedTable
Cheers…!
@Dilli_Reddy could please share the workflow not sure where to apply this logic. Just want to know which tools we need to use build the logic
UiPath.PDF.Activities package
Create a Python Script:
import pdfplumber
def extract_table_details(pdf_path, start_name, end_name):
with pdfplumber.open(pdf_path) as pdf:
table_details = []
for page in pdf.pages:
text = page.extract_text()
start_index = text.find(start_name)
end_index = text.find(end_name)
if start_index != -1 and end_index != -1:
table_details.append(text[start_index:end_index])
return table_details
# Example Usage
pdf_path = "path/to/your/pdf/file.pdf"
start_name = "Table Start"
end_name = "Table End"
result = extract_table_details(pdf_path, start_name, end_name)
print(result)
@ this logic is not working for me. I’m getting Excel Application Scope error for all the times
Can you please share me the sample data.
@copy_writes Sorry I do not have access to share upload access. I’m creating this request on personal system. In PDF file I want to extract table data it is there in multiple pages full table data I want to extract based on start date and end date