Extract Text From Image

ocr
scraping
studio

#1

I would like to extract 2 information (fare and date) from a receipt in photo image format in one process. however I keep facing two problems.


  1. how to record the “screen scraping” to capture these two information in the same sequence? when everytime click the screen scraping wizard , it will create another sequence in the main flow, but I want to be in the same sequence. I also try to manual drag and drop the “Get OCR TEXT” and use Microsoft OCR, however, I can’t preview the result.

  2. what is the correct way to extract the text, because the extract value is not accurate. Imagine, the photo of the receipt is not always the same size, what is the better way to ensure the OCR can extract the correct value. For instance, the fare may $XXXX.XX or $XX.XX. the length of the text may not always the same.

Can anyone provide the example / steps to guide me how to experience the power of the tool.


Extraction of text from an image using OCR
#2

The easiest solution is to extract all the text from image and then use string manipulation to extract the values you want, but this is also the slowest solution as OCR time is proportional with image size.
For example, you’ll get a string like “TAXI NO. … END 25/06/2017 16:30 … TOTAL FARE HK$8379.60” as OCR output. Then use String.IndexOf and String.Substring methods to extract the values.

Other solution is to use the relative scrape from Citrix Recorder - Screen Scraping - Scrape Relative. This way you can scrape an area relative to an image.


#3

Would you mind give me an example by using those 2 images? Also how you open the image?
by using Start Process / using Open Application / or the IExplorer?


#4

See attached workflow. You need to put the two images in the same folder, or change images path in the workflow. Also you will need to create an account (trial available) on Google Cloud Platform as I used Google Cloud OCR as the Microsoft OCR or Google OCR were unable to extract reliable data from those images.
Main.xaml (10.8 KB)

You can use the Load Image activity to load the image in memory and then use that image for any OCR engine to extract the text.
It would help you a lot to use a scanner or an app like https://www.google.com/photos/scan/ to capture better photos.

This is the output from Microsoft OCR engine:
TAXI NO.
START TOTAL
END 25/06/2017
TOTAL KM
PAID KM
PAID MIN
SURCHARGE
TOTAL FARE 601.82
ø.øø
HK$Ø.ØØ

TAXI NO,
2.5/12/2009 15:06 ENO
TOTAL KM
SURCHARGE
TOTAL F ARE
THANK :Eii.3

And with Google Cloud OCR:
TAXI NO.
E START TOTAL REPU
T END 25/06/2017 16:38
AE TOTAL KM 601.82
E PAID KM
179.88
0.00
PAID MIN
itt SURCHARGE H0.
TOTAL FARE H 8379.68

TAXI NO
START 25/12/2009 15:06
,T END 25/12/2009 15:06
9.00
TOTAL KM
PAID KM
PAID MIN
9.0
0.10
SURCHARGE HK$0.0
TOTAL FARE K4.59
THANK YOU


How can i extract the content in a highlighted image
#5

Hi, @Silviu @ddpadil @aksh1yadav h1
I want to extract only URL from different images.Images are always different and pattern of URL is not same always. e.g.

Here I want to extract URL “Extract Text From Image

Thanks


#6

You can use OCR over the entire image and then search for a word starting with http:// or https://
Ideally, you know the region where the URL can be present and OCR only that region.


#7

Thank you @Silviu


#8

@Silviu
I am trying to extract text from this image using screen scraping method and also using Google Cloud OCR, but it is not getting extracted with proper accuracy. How do I extract the data specified in the tabular format ? can you help me out with it? I’ve uploaded the image I’m working with. Let me know how to proceed with it. Thanks in advance.


#9

Hi @Silviu @aksh1yadav
Can you please tell me how to read only some part from image(as i want to extract only URL from whole image) using google OCR?
Untitled

Here I am giving path of URL through excel.Also I have many images and it requires more time.So there is any solution to reduce time
Thanks in advance.


#10

Hi Shehal1,

If you only need a part of the image maybe it’s better to crop the image before processing with OCR as it will be faster on smaller images.
But, I see that you have some 3 Delay activities in your workflow, why?
Also, you can try with Microsoft OCR as it is faster than Google OCR in most cases.


#11

Hi silviu,

I have a passport image which i have downloaded from google how will i extract the information from those images i read your above code loading images it was an awesome technique however passport images are little different beacuse i can find the index of like passport number but the data are stored not beside it its stored at the bottom i tried with scrapping using citrix however it didnt worked.
Please help me i am trying to find its solution froma long time and today i read your post i find it awesome technique to extract data from images.
I am attaching two sample passport if you can just help me in extracting passport number from both the images rest data i will extract using your code.AishwaryaPassportPassport2


#12

Hi @Silviu, have an scenario where I need to extract text from image which is on web, and position of text image is always same and the field name is also given, I have only problem in getting text and placing it to another field
help will be highly appriciated

image


#13

Hi Aditya,

Usually you can’t extract text from that type of images using OCR, as the purpose of that security check is to prevent automatic processing / OCR extraction.

Regards
Silviu


#14

Hi all,

The best way to read text from image is to use ABBYY OCR where we can define the relative position of the label and extract the text by using search and clock element ,that is the easiest and the accurate way to get text form images, by using ABBYY OCR we can even get the hand written using ICR.