Hi All,
I have created a workflow to open and read the data from pdf file. It is not working to open the pdf file. Please help me to open pdf file.
Thank in advance
Niranjan
Hi All,
I have created a workflow to open and read the data from pdf file. It is not working to open the pdf file. Please help me to open pdf file.
Thank in advance
Niranjan
@lrtetala the file is not opening from the given path. It is reading only when it is opened
Hi, Niranjan, You can use the “Read PDF Text” activity to read the text from a PDF file. However, if you’re facing issues opening the PDF file itself, you might want to use the “Start Process” activity to open the PDF reader application (e.g., Adobe Acrobat Reader) and then use the “Read PDF Text” activity to extract text.
Here’s a simple example of a UiPath workflow to open and read a PDF file:
"C:\Program Files\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe"
for Adobe Acrobat Reader).pdfText
).pdfText
variable in subsequent activities to perform tasks with the extracted text.@rikulsilva i don’t find this path in my system
@Shekar_Ch thanks for the detailed info could you please provide any sample workflow
Why are you trying to open it? You don’t have to open PDF files to read them. Have you installed the PDF package and tried the Read PDF Text activity? If it’s not text, but a scanned document, then you use OCR/Document Understanding.
@postwick Actaually I have few tables in different pages, I want to read 2 tables out of 50 pages, suggest me what is the best way to implement.
Can you please share the Sample file so we can help you, extraction datable we can do using DU( Document Understanding) or we can send the extracted data to the AI and there you send the call using argument AI will give the table output here AI means (OPen AI Chet Gpt or Genric AI) you can search in Youtube how to integrate with AI @RAKESH_KUMAR_BEHERA or @nisargkadam23 they explain it in detailed.
@Niranjan_k you can use the below approch.
Install Libraries:
pip install beautifulsoup4 requests
Create Python Script:
> import requests
> from bs4 import BeautifulSoup
>
> def extract_table_data(url):
> response = requests.get(url)
> soup = BeautifulSoup(response.text, 'html.parser')
>
> # Extract data from the first table
> table1 = soup.find_all('table')[0]
> data1 = [[td.text.strip() for td in row.find_all('td')] for row in table1.find_all('tr')]
>
> # Extract data from the second table
> table2 = soup.find_all('table')[1]
> data2 = [[td.text.strip() for td in row.find_all('td')] for row in table2.find_all('tr')]
>
> return data1, data2
>
> # Example usage
> url = 'https://example.com/page1'
> table1_data, table2_data = extract_table_data(url)
> print(table1_data)
> print(table2_data)
Replace 'https://example.com/page1'
with the actual URL of the page containing the tables.
For Each pageUrl in ListOfUrls
Invoke Python Method activity
Input: pageUrl
Output: table1Data, table2Data
# Use the table data as needed
End For