I’m using OCR to scrape a scan billing but because of poor quality the decimal point is being read as 1. I’m trying to clean up the data extracted but I am having trouble.

This is the sample extracted value: 5,000100 which should be 5,000.00. I want to check if the third to the last digits is 1. and if it is 1, replace it with dot. I do not want to use contains because the data is dynamic.

This can do but I feel this is not correct method. For this file it is ok and some other thing will come for next file then how you will handle this.

Use different OCR engines and change the scale to get better results.

I tried using Google OCR and Microsoft and that was the best the OCR can extract the data. I am working on limited OCR at the moment so I am forced to do data clean up.


For scanned documents, Abbyy OCR is the best engine. You will get better results with that.


did you try Tesseract OCR engine & sett the scale?

Tesseract OCR is google OCR right? That is what I am using. Also I cannot work with ABBY right now. I need to work on free OCRs

Hi @wonderingnoname,

Please refer the below workflow,

The below workflow for example 5,000100 converts to 5,000.00 and not only 1 but let it be any symbol or number or special characters it will convert to .

Because i am replacing whatever let be at 3rd position from the end with a dot.

Main (17).xaml (11.0 KB)

Thank you @anil5

