Cannot extract data from Italian ID Card

Hi everybody,
I’m trying to use the form extractor to extract a few fields from an old Italian ID Card (not the plastic one). The scanned document is a 2 pages grayscale 300dpi PDF. I’m using this file (a non redacted version of course): ID-card-redacted.pdf (1.5 MB) both for the template creation and as a document to extract the data from, so in theory the template matches the document 100% since it’s the same file.
I’ve used anchors to define the two fields I want to extract. It should be easy looking at the tutorials and docs.
When running, it recognizes the two pages correctly, but it does not extract the “Nome” field and wrongly extracts the “Cognome”. I don’t understand why. I’ve tried to change the anchors and also to add more than one anchor for each field: in the latter case even the Cognome is not extracted…
Here follows two screenshots from the template manager:

Using those two anchors, here the result from the validation station (the extracted Cognome filed value is surreal…):

Any idea?

Hi @gromeo
just to know one thing

what was confidence level in extraction of Name ?

Field Name (nome) was not extracted. Field surname (cognome) was extracted with a 69% level of confidence but was horribly wrong as per above image.

Hey @gromeo

  1. I would suggest you to use intelligent form extractor that should help you give more better and accurate results
  2. Try machine learning extractor which already has a prebuilt models for ID card
    Documentation link: Public Endpoints

The above two solutions should help you solve your problem. Let me know if you face any problem


thanks for your response. I’ll try those solutions, but I don’t understand why the standard form extractor is behaving this way: I’m using the same file for the template and for the document to process, and it doesn’t work… That’s not expected. I’d like to understand whether there a bug in form extractor or I’m doing something wrong.

Giovanni Romeo

I replaced form extractor with intelligent form extractor, using the very same template (exported from the form extractor template manager and imported in the intelligent form extractor template manager) and the result are exactly the same. No name extracted and horribly wrong surname extracted with 69% confidence. I think there’s something wrong with the template.

I also tried to create a template without anchors, just plain custom areas (I think that’s how they call the areas defining the values to extract), but in this case I get not values extract (missing both name and surname). I also tried to use a different pdf file (both for template and input document) with front and back of the ID card on the same (first) page, and I also get no extracted data.
Honestly I don’t know what to try next. The OCR process is done correctly (almost all text is recognized)…

Anyone else? I’m stuck…
Giovanni Romeo