This solution should work - I’m using it for scraping tables row by row and once the numbers are right it’s pretty consistent for RDP. For scanned documents it will be harder, as there might be 1 or 2 pixel differences (f.e. the paper got a little bit bent) that can throw it off, especially for finding the label image - it can vary what is the exact coordinate of it.
Also make sure that the image is 100% horizontally aligned, or you will catch the boxes, especially in the right side/further part of it.
Quality will probably be awful though, handwritings should be done through ICR, most regular OCR engines give garbage output for it.
For empty text it will either return junk or throw an error (“OCR returned empty text” is an actual exception, at least with GoogleOCR).
In printed forms what is usually done is that the boxes are in colour (either green or red) which is filtered out during scanning. That way OCR/ICR can just read an area instead of singular characters, and actually apply dictionary correcting (for names, addresses and other known inputs).
With single character readings it will be nigh impossible to catch an error in recognition and 99% of the time human validation will be required. No matter what OCR/ICR companies say, in my experience if you want reliably high accuracy for handwriting, you need a human validation.