Get only the first occurrence in intelligence OCR

Intelligent OCR,

In data extraction Scope with regex based extractor
It takes of the occurrence of the field… I need only the first occurrence

what is your ouput (varaible) name ?

hi @Sweety_Girl

in regex you can specify the stating and ending of the text format using ^ and $ symbols

also you need to mention that the string which your extracting is multi line or single line.

above inputs may help to find the exact match and occurrence of the pattern


Yup… but in my pdf there are repetition of same pages

which activity you use to do it ?


Let me say your output from REGEX expression is “iEnumResult” …(variable type:IEnumrable)

go to Assign.

count = iEnumResult.count

do while —(i<=count-1)

      Messagebox --iEnumResult (i).ToString()

The output is not IEnum

Help with this @Palaniyappan

1 Like

Sure buddy
Kindly elaborate on the process pls may be with an example
Cheers @Sweety_Girl

1 Like

I am using the same format used in this zip formatted file

I am trying the extract a particular field which is repeated in the pdf…

I need only the first occurrence of the field

Can I have that field value and I hope you would have stored that as a string
And from that which value you would have the first occurrence
Sorry for lot of questions but this would surely lead us to the solution
Cheers @Sweety_Girl

In my pdf Due Date is repeated, I need only the first occurrence

we can use this expression
(?<=DUE DATE).*
In matches activity and get the output witha variable of type system.collections.generic.ienumerable(system.Text.RegularExpressions.Regex.Match)

–we can use a assign activity like this
str_output = out_matches(0).ToString
where out_matches is a variable from Matches activity

Cheers @Sweety_Girl

In my format, I am getting any inum variable

use this expression for DUE DATE
str_output = System.Text.RegularExpressions.Regex.Matches(str_input,"(?<=DUE DATE).*")(0).ToString

where str_input is a input string we pass
and str_output is the variable that stores the first occurence of the duedate value

Cheers @Sweety_Girl

You mean that use this stop only for due date?

You can also mention the Regex for email but in separate matches expression so that here we will be getting the value of first occurrence of due date

Cheers @Sweety_Girl

The solution which you gave is correct…

But I am using the

Data extraction with regex based extractor…
Any change can be updated in this?