Hello, I used Read PDF With OCR with Tesseract OCR to scan pdf file. Now I want to get texts which starts by “000” from the text. I think I need to split text and filter but I don’t know how to do.
please help.
Temuulen
Hello, I used Read PDF With OCR with Tesseract OCR to scan pdf file. Now I want to get texts which starts by “000” from the text. I think I need to split text and filter but I don’t know how to do.
please help.
Temuulen
Welcome to Community!!
Can you please provide some sample text how it looks.
Regards,
You can try splitting the text in for loop and put an if condition to check if the current element contains “000”
For Each line In extractedText.Split({Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
If line.Contains("000") Then
Assign filteredText = If(String.IsNullOrEmpty(filteredText), line, filteredText + Environment.NewLine + line)
End If
So basically am splitting the text with new line as delimiter(You can any delimeters instead of this) , and it will check for text containing “000” and if it matches then it will add it to the filteredText
Or you can you use regex to extract text which starts with 000
.*^000.*
Hope it helps you!
Hi @lrtetala
Sorry, I can’t upload file for some reason.
“2 зээлд /УулАХ ХЭЛТСИЙН ЗАХИРАЛ
Мэдэгдэлтэй танилцаж хүлээн авсан: 9 24 Зэ? феууйг” /
Мэдэгдэл хүлээн авсан огноо: 22Е2 оны 2 сарын22“өдөр
000139117
(44727724”
Here is the text
Hi,
How about either of the following?
System.Text.RegularExpressions.Regex.Match(yourString,"(?<=^|\n)000.*").Value
OR
System.Text.RegularExpressions.Regex.Match(yourString,"\b000.*").Value
Regards,
Try this
.*(000)\d+
System.Text.RegularExpressions.Regex.Match(yourstringinput.ToString,“(.*(000)\d+)”).Value
I hope it helps!!
I have used regex and it works fine
System.Text.RegularExpressions.Regex.Match(inputText,".*(000).*").ToString
Refer below xaml
testt.zip (2.6 KB)
hi @Vikas_M
Thank you. It works “System.Text.RegularExpressions.Regex.Match(test,”.*(000)\d+“).ToString” with this.
Regards,
Hi @lrtetala
It works fine. Thank you!
What if the 4th character is not “0”?
Regards,
I got it.
Thank you all!
Regards,
If the 4th character is “0” it does not extract
In this case 4th character is not “0” so it extracts
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.