I need to extract 4 fields from an image, and copy them into an excel table. The fields are:
- Name – (Nombre)
- Surnames – (Apellidos)
- Leaving date – (Fecha de la BAJA)
- Discharge date – (Fecha de la ALTA)
The main problems that I find are:
- All of them are pictures. (Some scanned into a .PDF, and some as a .jpg files)
- All of them are structured into sections (Rectangles)
- The quality of the pictures is very diverse. (Some with shadows, other taken in perspective, with misaligned margins, etc.)
- The structure of the document is different depending of the region of My country they come from. So the 4 fields are located in different positions. ( But that is a problem to deal with later. At the moment I’m sticking to a single type of document).
I’m aware that the first step is to get the documents with the highest possible quality. (I’m working on that). But from there, I’ve tried everything I know about image and PDF OCR recognition within UiPath. But the quality of the output txt files is very poor. And to identify the fields to extract, and an anchor, is complicated. (Sometimes it recognize “name”, “mame”, ”n@me”, nane”,… you get the idea…). Maybe it will require further text processing through programming. I honestly don’t know.
Considering all the above, does anyone know if there is another powerful OCR program that can be integrated within UiPath? (One Note seems to work pretty good, but maybe there is another more advanced, so we do not have to improve the image manually, using Power Point or Photoshop before the OCR). And also, Hoy can I identify the different sections? (Maybe zooming into them?)
I send attached an example of an actual Discharge date document, to illustrate the problems I told above.
Thank you very much in advance.