Extract string from pdf

teo_choudu · February 13, 2020, 3:37am

Hi.

I had a pdf file format as below:

A:…
A:…
B:…
B:…
C:…
B:…

I want to extract all the string after B: only.
The problem is the number of sentence started with B is different for different pdf fill.

How can i do this?

chenderson · February 13, 2020, 3:46am

Good evening @teo_choudu

It sounds like you’re most likely going to have to extract the full text using the “Get Full Text” activity and then possibly use regular expressions to remove the portion of the string you don’t want.

In order to provide a more thorough solution, we would need to see (scrubbed) examples of the possible PDF contents and the data you would like extracted.

teo_choudu · February 13, 2020, 4:08am

What is the regular expression? How?

Thank you ^^

teo_choudu · February 13, 2020, 4:11am

I want to extract all the substring after "Position: "

chenderson · February 13, 2020, 4:15am

Regular expressions are a neat (but can be complicated) way of extracting values from strings using patterns.

Here’s an example that grabs all the text from after the word "Position : ".

You can use the “Matches” activity to enter the pattern from the link above and start extracting values from the PDF.

teo_choudu · February 13, 2020, 4:21am

How to add in my uipath workflow or sequence?

Manish540 · February 13, 2020, 4:33am

Check this workflow,
Test.xaml (6.2 KB)

Vijay_Upadhya · February 13, 2020, 4:36am

Use Match ACTIVITY in uipath like Condition BUSINESS NATURE : exactly … Use for each loop get the Result .Happy automation … Cheers

teo_choudu · February 13, 2020, 4:45am

This work. But if I want to extract the text after "POSITION: " and "TYPE OF INDUSTRY: ", how?

Manish540 · February 13, 2020, 5:01am

Change the regex in the above workflow,
System.Text.RegularExpressions.Regex.Matches(MyString,“(?<=POSITION:).+|(?<=TYPE OF INDUSTRY:).+”)
And inside for each in message box use just - item.

teo_choudu · February 13, 2020, 5:46am

Its works. Thank you.

teo_choudu · February 13, 2020, 6:57am

How can I check whether “DIRECTOR” contained in all the item in Matched Collection?
Do I need to loop by using for each item in Matched Collection?

teo_choudu · February 13, 2020, 9:51am

What to write if i want extract string after industry?
Do i need to change and add * in front?

Manish540 · February 13, 2020, 9:58am

You can check your regex expressions here,

Manish540 · February 13, 2020, 9:59am

Just use this expression,
(?<=INDUSTRY:).+

system · February 16, 2020, 9:59am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
PDF Extraction one particular word to another word Something Else feedback	7	1289	February 24, 2022
Extracting a substring from a text Studio studio , question , activities_panel	6	769	June 28, 2022
Extract Specific Info from PDF Something Else feedback	8	1095	January 17, 2022
Extrat selected data from PDF Activities uiautomation , activities , question	4	622	November 11, 2022
Extract data from pdf document Help pdf , activities , question	18	1967	February 3, 2020

Most Active Users - Yesterday
Anil_G
sharazkm32
ashokkarale
Yoichi
ppr
singh_sumit
sonaliaggarwal47
marco.roensch
Ragavi_Rajasekar
Lucky1
More details...

Extract string from pdf

Related topics