Reading PDF, Regex/Split string

Hello,
I am using “Read PDF” to extract a certain paragraph between 2 lines of text, which I have done. The output is a large string that looks like this:

a) information a

b) information b

c) information c

d) information d

e) information e

I’d like to regex or split/substring in an array that looks like this:

Array(0) = a) information a
Array(1) = b) information b
Array(2) = c) information c
Array(3) = d) information d
Array(4) = e) information e

Is there a way to do this cleanly? Thank you all!

@NDC

Please try this

Str.Split({Environment.NewLine},Stringsplitoptions.RemoveEmptyEntries)

Cheers

@Anil_G,

Thanks for replying!

I did a poor job of explaining the output.

It actually looks like this:

a) information a
more information
more information
more information

b) information b
more information

c) information c
more information
more information

d) information d
more information
more information

e) information e
more information
more information

It isn’t exact just how many lines are in each choice but it is something like this. Thanks!

Hi @NDC ,

Maybe you could try using the below Expression :

Regex.Split(variable2,"\n{2,}")

@NDC

Please try this

Regex.Matches(str,"[a-zA-Z]\).*",RegexOptions.MultiLine).Select(function(x) x.Value).ToArray()

Cheers

Anil,

Perhaps I am doing something incorrectly but there is an error stating:

“Select is not a member of System.Text.RegularExpressions.MatchCollection”

I do have the namespace imported for Regex

you can use the following steps to achieve the desired result:

  1. Use the “Read PDF Text” activity to extract the text from the PDF and store it in a string variable, let’s call it pdfText.
  2. Use the “Matches” activity to extract the paragraphs using a regular expression pattern. In UiPath, you can use the System.Text.RegularExpressions.Regex.Matches method with the following pattern: (?<=\n)\w+\) .+. This pattern will match each paragraph that starts with a letter followed by a closing parenthesis, and extract the entire paragraph.
  3. Loop through the “Matches” result using a “For Each” activity, and add each match to an array or list variable

@NDC

Please use this I did not update the casting

Regex.Matches(str,"[a-zA-Z]\).*",RegexOptions.MultiLine).Cast(Of Match).Select(function(x) x.Value).ToArray()

Hope this helps

Cheers