How do I read a checkbox from a PDF using Document understanding framework

Can someone please suggest me on how to read checkbox values as yes, no for “☒” ,“☐”(the documents im trying to read have these ballet box checkboxes) respectively

I’m using document understanding Intelligent Form extractor and also gave synonyms Im getting values as "Yes " always.

I have gone through this link “How do I read a checkbox?” but it didnt help me in getting the expected output

Hi @VC365

You need to define the taxonomy based on your needs. It doesn’t relate with the package you use. Once you define the taxonomy, you can map the ML defined fields in to the fields which you have defined in your taxonomy.


Hi @Gokul001 I haven’t used any ML skills in my process i was just using basic level document understanding in a community edition, if this is possible without ML defined fields could you pls assist me on the same. I have defined those checkbox fields as bool in taxonomy. Is this only possible with using AI fabric and ML skills please assist on this

Can you please share your workflow here, @VC365?

Hi I have created a video hope that helps you: How to read a checkbox from a PDF using Document understanding framework - YouTube

HI @RAKESH_KUMAR_BEHERA yes I have gone through that but my documents are scanned PDF copies for which those check boxes are not detected though have provided in signature. As an alternate i was trying to replace those checkboxes with Y and N using python code even that is not properly detecting for all different files. Could you please suggest any way where I can read them affectively.
Would be great if I get some solution.

Thanks in advance

Hi, Hope you are doing fine. You can use the Synonyms feature which is available while you use to define the template (in form extractors). Synonyms is defined to capture the check box values, tick marks, yes or no etc.

Hope this helps… cheers.

you can try to use form extractor only for the check boxes and intelligent form extractor for other attributes.

I tried using this as a first thing but it did not work.

will try this once thank you

That is right, you would be getting the Synonyms while you define templates using intelligent form extractor (IFE) in the extraction process. when you select those, automatically your check box values will be captured from the document. To hold those values, it has to be defined as boolean value in taxonomy. so if checkbox is yes, it should be the boolean true… hope you get it. let me know if it works. cheers.