How to get only texts which starts by "000"

Hello, I used Read PDF With OCR with Tesseract OCR to scan pdf file. Now I want to get texts which starts by “000” from the text. I think I need to split text and filter but I don’t know how to do.
please help.

Temuulen

Hi @Temuulen_Buyangerel

Welcome to Community!!

Can you please provide some sample text how it looks.

image

Regards,

1 Like

Hey @Temuulen_Buyangerel

You can try splitting the text in for loop and put an if condition to check if the current element contains “000”

For Each line In extractedText.Split({Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
    If line.Contains("000") Then
       Assign filteredText = If(String.IsNullOrEmpty(filteredText), line, filteredText + Environment.NewLine + line)
    End If

So basically am splitting the text with new line as delimiter(You can any delimeters instead of this) , and it will check for text containing “000” and if it matches then it will add it to the filteredText

Or you can you use regex to extract text which starts with 000

.*^000.*

Hope it helps you!

1 Like

Hi @lrtetala
Sorry, I can’t upload file for some reason.
“2 зээлд /УулАХ ХЭЛТСИЙН ЗАХИРАЛ
Мэдэгдэлтэй танилцаж хүлээн авсан: 9 24 Зэ? феууйг” /
Мэдэгдэл хүлээн авсан огноо: 22Е2 оны 2 сарын22“өдөр
000139117
(44727724”
Here is the text

Hi,

How about either of the following?

System.Text.RegularExpressions.Regex.Match(yourString,"(?<=^|\n)000.*").Value

OR

System.Text.RegularExpressions.Regex.Match(yourString,"\b000.*").Value

Regards,

1 Like

Hi @Temuulen_Buyangerel

Try this

.*(000)\d+

System.Text.RegularExpressions.Regex.Match(yourstringinput.ToString,“(.*(000)\d+)”).Value

I hope it helps!!

1 Like

@Temuulen_Buyangerel

I have used regex and it works fine
System.Text.RegularExpressions.Regex.Match(inputText,".*(000).*").ToString

Refer below xaml
testt.zip (2.6 KB)

1 Like

hi @Vikas_M
Thank you. It works “System.Text.RegularExpressions.Regex.Match(test,”.*(000)\d+“).ToString” with this.

Regards,

Hi @lrtetala
It works fine. Thank you!
What if the 4th character is not “0”?

Regards,

I got it.
Thank you all!

Regards,

@Temuulen_Buyangerel

If the 4th character is “0” it does not extract
In this case 4th character is not “0” so it extracts

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.