Finding text


Hello guys,

can you please help me on OCR and finding a text?

Lets say I scan pdf and I want to find some information like date, ID and so on. What is the best practice?

So far I use Substring. So I convert OCR text to String, then I find indexOf word and substring the text for example ID = OCRoutput.SubString(IndexOfID+2, 10) as I know that IDs lenght is 10. What if we dont know the exact lenght of the integer? Is there a way how to use substring but, the second argument wont be its lenght but for example first space?

Thanks a lot.


If you don’t want to use indexing and substring then You could make use of relative scraping for each field like date,ID…so on.


But every pdf is different. It can be problem. So there is no way how to set that the lenght of the string would end by the first space?


Yep its dynamic then relative scrape won’t work.
you can make use of split string.
str.split(" ");
string newString = myString.Substring(myString.IndexOf(’ ') + 1);
For reference.


Thanks a lot. Partialy it helped.