How do I read a checkbox from a PDF using Document understanding framework

Can someone please suggest me on how to read checkbox values as yes, no for “☒” ,“☐”(the documents im trying to read have these ballet box checkboxes) respectively

I’m using document understanding Intelligent Form extractor and also gave synonyms Im getting values as "Yes " always.

I have gone through this link “How do I read a checkbox?” but it didnt help me in getting the expected output

Hi @VC365

You need to define the taxonomy based on your needs. It doesn’t relate with the package you use. Once you define the taxonomy, you can map the ML defined fields in to the fields which you have defined in your taxonomy.


Hi @Gokul001 I haven’t used any ML skills in my process i was just using basic level document understanding in a community edition, if this is possible without ML defined fields could you pls assist me on the same. I have defined those checkbox fields as bool in taxonomy. Is this only possible with using AI fabric and ML skills please assist on this

Can you please share your workflow here, @VC365?

Hi I have created a video hope that helps you: How to read a checkbox from a PDF using Document understanding framework - YouTube

HI @RAKESH_KUMAR_BEHERA yes I have gone through that but my documents are scanned PDF copies for which those check boxes are not detected though have provided in signature. As an alternate i was trying to replace those checkboxes with Y and N using python code even that is not properly detecting for all different files. Could you please suggest any way where I can read them affectively.
Would be great if I get some solution.

Thanks in advance

Hi, Hope you are doing fine. You can use the Synonyms feature which is available while you use to define the template (in form extractors). Synonyms is defined to capture the check box values, tick marks, yes or no etc.

Hope this helps… cheers.

you can try to use form extractor only for the check boxes and intelligent form extractor for other attributes.

I tried using this as a first thing but it did not work.

will try this once thank you

That is right, you would be getting the Synonyms while you define templates using intelligent form extractor (IFE) in the extraction process. when you select those, automatically your check box values will be captured from the document. To hold those values, it has to be defined as boolean value in taxonomy. so if checkbox is yes, it should be the boolean true… hope you get it. let me know if it works. cheers.

@Pradeep.Robot thanks for the support, have followed everything you mentioned unfortunately it didn’t work. its by default giving yes only for unchecked boxes as well in validation station and if i extract again and save then also it is saving as yes.


You mean, after following this one you are getting “Yes” for unchecked box? can you please provide screenshots of the form check box?

Hi @VC365 ,

If you share any sample pdf, it would be helpful to get some other logic?


Hi @VC365,

Some samples would be indeed useful in trying to understand the problem you are facing. There can be multiple reasons why your checkboxes are not recognized properly, so let’s take a look at a document first and try to understand the root cause of the issue.



I have the similar issue. I tried to use anchor when creating the template. It performs a little better but still only at about 30% accurate at most. In attached you can see that lots of check boxes are either not being recognized, or being recognized