I need to extract some text from pdf.
I use Pdf read Text because my pdf it’s original pdf and not scanned.
The result of pdf extract is
if i use this regex
System.Text.RegularExpressions.Regex.Match(str_test, “(?<=Utilizzatore carta:)(\n.*)”).Value
the result it’s 0!
if i create manually a new variable string with the text of PDF the regex run!!!
I don’t understand why! The text and type of variable is the same!
Somenone kwnos why?
Can you check whether this “Utilizzatore carta:” is present in the local value?
It might be linebreak matter. Can you try the following?
System.Text.RegularExpressions.Regex.Match(str_test, "(?<=Utilizzatore carta:)(\r?\n.*)").Value
I think it’s all about the text I believe
There could be a extra line when read with PDF activity and might impact the Regex user
So whenever you are applying Regex for a pdf extracted text remove the extra blank lines and then give a try like this
Say you have the Strinput as output of pdf activity then use a assign activity like this
stroutput = String.Join(Environment.NewLine, Split(Strinput.ToString, Environment.NewLine, System.StringSplitOptions.RemoveEmptyEntries))
Now try applying the Regex for that stroutput
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.