Find and search data in PDF file

Niranjan_k · October 30, 2023, 9:55am

Hi All,

I have pdf file which is having multiple pages of details, now I would like to search and find the details in pdf file. How can I achieve this.

Thanks in advance
Niranjna

supermanPunch · October 30, 2023, 10:04am

Hi @Niranjan_k ,

The question is a broad and we could direct to many suggestions, for specific suggestions related to your cases, we would request you to provide more details on your requirements.

Generally, For Digitial PDF Documents, a first check would be with the below methods :

String Manipulation
Regex
Generate Datatable from Text (For Tables)

For Scanned / Mixture set of documents :

Only OCR with Regex/String Manipulation
Document Understanding
Other Intelligent OCR applications/services

Niranjan_k · October 30, 2023, 10:53am

@supermanPunch I’m new to UiPath, not sure how to use above functionality. Great if you could provide any examples of above functionality. Thanks.

Thanks,Niranjan

Dilli_Reddy · October 30, 2023, 10:56am

@Niranjan_k

Use the “Read PDF Text” activity in UiPath to extract the text content from the PDF file.
You can specify whether you want to extract text from all pages or from a specific range of pages.
Use string manipulation or regular expressions to search for specific details within the extracted text. You can search for keywords, patterns, or specific data elements within the text variable.
After searching for details, you can process the results in various ways. For example, you can store the results in variables, create a data table, or perform specific actions based on the extracted details.

Dilli_Reddy · October 30, 2023, 11:00am

Workflow:

PDF Read PDF Text=“path”
PDFPageRange

Assign Name=“pdfText” Value=“extractedTextVariable”
For Each x=“line” In="pdfText.Split(Environment.NewLine)

If Condition="line.Contains("Invoice Number")

Extract and process the invoice number from the line
Assign Name=“invoiceNumber” Value=“line”
Do something with the extracted invoice number

Niranjan_k · October 30, 2023, 12:18pm

@Dilli_Reddy @supermanPunch if you have Any recorded video in YouTube. Please help me so that I can follow the steps. Thanks in advance

supermanPunch · October 30, 2023, 12:21pm

@Niranjan_k ,

We could provide you with more suggestions on a broad case, but do you want to solve it for Knowledge/Learning purpose or do you have a deadline set for your requirement, If there is then as already mentioned we would need more specific details on the requirements so we could direct you to specific suggestions/solutions.

Dilli_Reddy · October 30, 2023, 1:19pm

Niranjan_k · October 30, 2023, 1:32pm

@supermanPunchThe requirement is team has some keywords based on it I have to get the data from PDF file, for some keywords I have to take the table detail from the PDF file. The data will change on pdf file daily basis.

Niranjan_k · October 30, 2023, 1:36pm

@Dilli_Reddy thanks for the video here I don’t see keyword search. Help me how can I get the data from PDF file based on keyword search. Sometime I need to export table details from PDF file if keyword matching. Data on pdf file will change daily basis.

Dilli_Reddy · October 30, 2023, 1:40pm

Use the “Read PDF Text” activity in UiPath to extract the text content from the PDF file. Make sure to specify the PDF file’s path.
Use string manipulation or regular expressions to search for specific keywords within the extracted text. You can use the String.Contains() method
Based on the presence of the keyword, you can conditionally extract data.
For extracting table details, you might consider using UiPath’s “Data Scraping” activity or a custom solution based on string manipulation and regular expressions.
Store the extracted data in a DataTable or another data structure for further processing.

copy_writes · October 30, 2023, 3:29pm

Hi @Niranjan_k once you are done with extraction, use regex if don’t know how to extract the value using regex please share the same output, so we can help you to extract the value.

We have many ways to extract the value from the string.

Niranjan_k · October 31, 2023, 10:48am

@Dilli_Reddy @copy_writes @supermanPunch i have tried to extract data from PDF, it’s not going to the exact page where data is in PDF file. How to read dynamically where match found.

Topic		Replies	Views
How to extract multiple text details and table info from PDF file Studio	6	447	October 31, 2023
How to extract multiple data from PDF Academic Alliance question	28	5540	August 22, 2020
Looping through PDF files to extract specific selected data Academy Feedback	4	1810	June 28, 2019
Extract characters from PDF with various pages Studio studio , question , activities_panel	11	558	October 26, 2023
PDF DATA EXTRACTION format Studio	1	635	March 25, 2020

Most Active Users - Yesterday
fvalencia
hentsou_mamy
More details...

Find and search data in PDF file

Related topics