Extract string from pdf


I had a pdf file format as below:


I want to extract all the string after B: only.
The problem is the number of sentence started with B is different for different pdf fill.

How can i do this?

Good evening @teo_choudu

It sounds like you’re most likely going to have to extract the full text using the “Get Full Text” activity and then possibly use regular expressions to remove the portion of the string you don’t want.

In order to provide a more thorough solution, we would need to see (scrubbed) examples of the possible PDF contents and the data you would like extracted.

What is the regular expression? How?

Thank you ^^

I want to extract all the substring after "Position: "

Regular expressions are a neat (but can be complicated) way of extracting values from strings using patterns.

Here’s an example that grabs all the text from after the word "Position : ".

You can use the “Matches” activity to enter the pattern from the link above and start extracting values from the PDF.

How to add in my uipath workflow or sequence?

Check this workflow,
Test.xaml (6.2 KB)

Use Match ACTIVITY in uipath like Condition BUSINESS NATURE : exactly … Use for each loop get the Result .Happy automation :slight_smile: … Cheers

This work. But if I want to extract the text after "POSITION: " and "TYPE OF INDUSTRY: ", how?

Change the regex in the above workflow,
System.Text.RegularExpressions.Regex.Matches(MyString,"(?<=POSITION:).+|(?<=TYPE OF INDUSTRY:).+")
And inside for each in message box use just - item.

1 Like

Its works. Thank you.

How can I check whether “DIRECTOR” contained in all the item in Matched Collection?
Do I need to loop by using for each item in Matched Collection?

What to write if i want extract string after industry?
Do i need to change and add * in front?

You can check your regex expressions here,

Just use this expression,

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.