Hello everybody.
I need to extract some text from pdf.
I use Pdf read Text because my pdf it’s original pdf and not scanned.
The result of pdf extract is

if i use this regex
System.Text.RegularExpressions.Regex.Match(str_test, “(?<=Utilizzatore carta:)(\n.*)”).Value
the result it’s 0!
if i create manually a new variable string with the text of PDF the regex run!!!
I don’t understand why! The text and type of variable is the same!
Somenone kwnos why?
thanks
loris
Srini84
(Srinivas)
#2
@l.sambinelli
Can you check whether this “Utilizzatore carta:” is present in the local value?
Thanks
Yoichi
(Yoichi)
#3
Hi,
It might be linebreak matter. Can you try the following?
System.Text.RegularExpressions.Regex.Match(str_test, "(?<=Utilizzatore carta:)(\r?\n.*)").Value
Regards,
1 Like
Hi
I think it’s all about the text I believe
There could be a extra line when read with PDF activity and might impact the Regex user
So whenever you are applying Regex for a pdf extracted text remove the extra blank lines and then give a try like this
Say you have the Strinput as output of pdf activity then use a assign activity like this
stroutput = String.Join(Environment.NewLine, Split(Strinput.ToString, Environment.NewLine, System.StringSplitOptions.RemoveEmptyEntries))
Now try applying the Regex for that stroutput
Cheers @l.sambinelli
1 Like
system
(system)
closed
#5
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.