Data Manager - How to create Boolean Fields for Annotation?

Hi Developers,

I’m trying to create a dataset with N no. of Form Documents on Data Manager, It needs to be trained in AI Fabric as a new custom ML Model.

I’m facing an issue while creating fields where some fields are needed to be Boolean (Eg. X, V in checkboxes)


How to create fields for this kind of inputs? If there is another way to handle these issues?

@loginerror @Lahiru.Fernando @Alexandru-Luca @Palaniyappan

Hi @Kesavaraj_K!

The Document Understanding ML model doesn’t yet have checkbox/radio-button support. That’s why there’s no boolean option in Data Manager either.

The good news is that this functionality is coming very soon to community preview.


Hi @Alexandru-Luca,

Thanks for the update!

Is there any alternative way to get the results unless using boolean variables in this scenario?

I tried using a hybrid model of Form and ML Extraction with All boolean variables towards Form Extraction. Didn’t quite capture the fields very well :sweat_smile:

Thanks in Advance!


Actually, there is a way! The product team was very busy during the holidays, thus you can extract the data using ML.

The way to go (at the moment) is to create a string field in DataManager for each checkbox and tag the label of the checkbox (not the box itself). Of course, you tag the label when the box is checked or leave it untagged otherwise. It is very important that you use a well-balanced data-set, where each checkbox is represented roughly the same number of times. Otherwise the results won’t be great.


Thanks @Alexandru-Luca

I’ll try this method out and update!


@Kesavaraj_K Any update on results?

@Alexandru-Luca Can you specify any further details regarding release dates for this (both community and enterprise). Working with a client where this functionality is needed, and would be great if we could give a response on when this functionality is supported.
Is there somewhere i can find whats in the pipeline for AI-Fabric/DataManager/Document Understanding upcoming features and dates?


Best regards,

Hi @sebastian.andre for now there is no Boolean field feature in our roadmap because the same result can already be obtained in the way described by Alexandru Luca above on this thread. Just label the options which are selected and do not label the options which are not selected. Then the model will learn to extract only options which are selected. Like in this case: “1 unit” and “PUD”.

In the case of this image, you would need 8 String type fields. For each field the model will either return something or it will return nothing. For instance you may have a field called 1-unit. Then when this field is returned with value “1 unit” then that means this option was selected. When the field is empty, then that means the option was not selected.

The improvement which we are planning by end of April timeframe is to have multi-valued fields, which means instead of having 8 String type fields here you would have a single multi-valued String type field, which would return a list of values. In this case this list would have 2 values: [“1 unit”, “PUD”]. The Intelligent OCR framework allows for this kind of multivalued fields.


Thank you Alexandru for the input.
So if I understand correctly you never have to mark the checkbox itself, but only the text corresponding to the checkbox (when checkbox selected)?
I will try out method.

Is it possible to do the other way around, i.e only mark the corresponding textfields when the checkbox IsNot selected. In this scenario you know that if the text return something the checkbox is not selected, and if the text returns Nothing, the checkbox Is selected?

Best regards,

@sebastian.andre you are right, you can also do it that way.


@sebastian.andre were you able to extract only checked text? I am trying to apply same logic @alexcabuz mentioned but I am getting all the fields - checked and unchecked.


I have the same issue, but in my case is not check boxes, it is signature fields.

I’m only labeling the labels that are signed.

After training the bot with more than 100 documents, I got this result:

The bot is still identifying the labels, even with no signatures.

BUT, the confidence in fields with NO signatures is almost always low. So, my problem is kind of solved if I use the confidence level to verify if there is a signature or not.

Am I doing something wrong? Does anyone have better results than me?

there has been revisions to the ML model so you will have to label a signature field as you would label any other field.

Signature detection

Starting with the last LTS Enterprise release, signatures can be detected using the UiPath Document OCR, hence, Machine Learning Models can directly detect signatures.

Label a signature like any other field is labelled in your document. Once detected by the UiPath Document OCR, the Machine Learning Model learns to recognize the field as a signature.

more details found here :

Thank you for your response.

But, to do this, the OCR have to recognize each signature. In my test cases, more than 1/3 of the signatures are NOT recognized. Is this result happening only for me?

Best regards!

It is a common scenario. What’s the OCR you’re using?

UiPath Document OCR

I’ll try using other engines and see if it returns better results. Thank you.

Sure. Google cloud vision API is a good OCR for handwriting and signature extraction.

you can also try forms extractor only for the signature field and check if it gets extracted, but changing the OCR is a great start.

