Finding text

mario · August 3, 2017, 10:38am

Hello guys,

can you please help me on OCR and finding a text?

Lets say I scan pdf and I want to find some information like date, ID and so on. What is the best practice?

So far I use Substring. So I convert OCR text to String, then I find indexOf word and substring the text for example ID = OCRoutput.SubString(IndexOfID+2, 10) as I know that IDs lenght is 10. What if we dont know the exact lenght of the integer? Is there a way how to use substring but, the second argument wont be its lenght but for example first space?

Thanks a lot.

ddpadil · August 3, 2017, 10:50am

Hi,
If you don’t want to use indexing and substring then You could make use of relative scraping for each field like date,ID…so on.

mario · August 3, 2017, 11:28am

But every pdf is different. It can be problem. So there is no way how to set that the lenght of the string would end by the first space?

ddpadil · August 3, 2017, 11:40am

Yep its dynamic then relative scrape won’t work.
you can make use of split string.
str.split(" ");
or
string newString = myString.Substring(myString.IndexOf(’ ') + 1);
For reference.

mario · August 3, 2017, 4:47pm

Thanks a lot. Partialy it helped.

Topic		Replies	Views
Get text from OCR scrape - string manipulation Help	4	1503	February 21, 2019
Find a specific Number in a lot of Text Help	2	1031	August 9, 2019
I want to read a "Find OCR Text:" Help pdf , ocr	3	2501	September 6, 2018
Question about OCR Help	5	738	October 28, 2019
Search for a text in filename Help	8	2165	October 29, 2018

Most Active Users - Yesterday
Anil_G
ashokkarale
Ajay_Mishra
Gautham_Pattabiraman
BHUSHAN_NAGAONKAR1
vrdabberu
ABHIMANYU_THITE1
lrtetala
samantha_shah
shyamala_shyamu
More details...

Finding text

Related Topics