How To Extract Data From PDF Using 'Read PDF Text' And RegEx ?

system · August 8, 2023, 8:17am

How to extract data using 'Read PDF text' and RegEx?

In some specific scenarios, some OCR, ML or other PDF activities to extract data from PDF do not work as expected. If it is required to extract a specific value from a PDF, an alternative is to use the activity 'Read PDF text' from the UiPath.PDF.Activities package and RegEx.

Perform the below,

First step would be using the 'Read PDF Text' activity. This will extract all the text from one specific PDF, and you will be able to save the text in a output variable.

Post storing al the text in the output variable, extract the specific value required by setting up a pattern, for this we use regular expressions. Here is an example of how to use the regular expression:

Consider the example: Extract a value that is after the word "Total".

Search for the activity 'Find Matching Patterns'

In the properties panel, set the following string value for the Pattern parameter:

"Total\s*([\d.,]+)"

The text 'Total' followed by zero or more whitespace characters (\s*), and then captures a sequence of one or more digits possibly separated by periods or commas ([\d.,]+). The parentheses create a capture group that is returned by the Matches activity.

For the result, create a new variable to hold the match result, say matchResult.
A new message box is added to verify if matchResult has the correct value.

Topic		Replies	Views
How to read the specific data in pdf Activities pdf , activities , question	33	4910	June 2, 2021
PDF particular data Activities pdf , activities	7	398	May 8, 2023
Read PDF Uipath Activities 2 Help	5	798	September 23, 2020
Get text using Regex Activities pdf , activities , question	7	968	June 12, 2022
Extract data fromPDF Help	13	1189	October 2, 2019

Most Active Users - Yesterday
Anil_G
Yassine_LAMARTI
More details...

How To Extract Data From PDF Using 'Read PDF Text' And RegEx ?

How to extract data using 'Read PDF text' and RegEx?

Related topics