Disabling the tesseract engine's data dictionary

I am using the Google OCR to scrape a gif image. The fields that I am interested in contain alphanumeric codes (i.e. a mix of letters and digits). At times, the engine is incorrectly recognizing 0 (zeros) as O (letter O). As the field is an ID, incorrect identification kills the whole purpose of automation.

Upon analysis, I stumbled upon a tesseract thread that talks of muting the auto correction / data dictionary in the engine. This will lead to the OCR identifying a zero as a zero and not try to figure out a “logical” word by looking at adjoining characters.

How can I do this in UI Path?


@dowlagar Hi Nitesh, I am also working on OCR activities, i saw your thread, where i am struck with similar things, so can you please tell me have you solved this?

Srinivas K