Impossibile to use Regex with Pdf Acitivity

Hello everybody.
I need to extract some text from pdf.
I use Pdf read Text because my pdf it’s original pdf and not scanned.

The result of pdf extract is
image

if i use this regex

System.Text.RegularExpressions.Regex.Match(str_test, “(?<=Utilizzatore carta:)(\n.*)”).Value

the result it’s 0!

if i create manually a new variable string with the text of PDF the regex run!!!

I don’t understand why! The text and type of variable is the same!

Somenone kwnos why?

thanks

loris

@l.sambinelli

Can you check whether this “Utilizzatore carta:” is present in the local value?

Thanks

Hi,

It might be linebreak matter. Can you try the following?

 System.Text.RegularExpressions.Regex.Match(str_test, "(?<=Utilizzatore carta:)(\r?\n.*)").Value

Regards,

1 Like

Hi

I think it’s all about the text I believe

There could be a extra line when read with PDF activity and might impact the Regex user

So whenever you are applying Regex for a pdf extracted text remove the extra blank lines and then give a try like this

Say you have the Strinput as output of pdf activity then use a assign activity like this

stroutput = String.Join(Environment.NewLine, Split(Strinput.ToString, Environment.NewLine, System.StringSplitOptions.RemoveEmptyEntries))

Now try applying the Regex for that stroutput

Cheers @l.sambinelli

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.