Extract PDF text using Regex

Jessica_Moseley · January 25, 2022, 4:06pm

Hi Community!

I am new to Regex.
I read a PDF and output a string variable ‘text’.
I am trying to capture what comes after FROM: which are 3 lines followed by SUBJECT:. Example below.
ex. FROM: James Bond
Associate Director, C Team
Office of Towers
SUBJECT: XYZ

I’d like to capture the first line ‘FROM: James Bond’
Assigned James Bond as string variable ABC

I have Regex as
ABC= System.Text.RegularExpressions.Regex.Match( text, “(?<=FROM: ).*(?=SUBJECT:)”).value

The output is blank. Please help. Thank you!

sarathi125 · January 25, 2022, 4:13pm

@Jessica_Moseley ,

Check this regex101: build, test, and debug regex

ppr · January 25, 2022, 4:37pm

the . is expressing every character except line break.
grafik

for getting text spanning over multiple line we can do:
grafik

for James Bond only we can do (optional trim the value afterwards)
grafik

Jessica_Moseley · January 25, 2022, 4:42pm

Thanks, this worked!

system.text.regularexpressions.regex.match(text, “(?<=FROM:).*”).value

system · January 28, 2022, 4:42pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to get text starting from a word to end of the line Studio studio , question , activities_panel	12	329	October 31, 2024
Impossibile to use Regex with Pdf Acitivity Studio studio , regex , question , tools , pdf-extraction	4	827	January 24, 2022
Regex Match help - How do I match on a string and Studio studio , regex , question , activities_panel	4	534	February 26, 2024
How to Get text from PDF if it is in multiple lines Studio pdf , activities , studio , question	7	1729	October 14, 2021
Regexp for get string between 2 topics Help activities , regex , string , question	6	854	November 25, 2019

Extract PDF text using Regex

Related topics