Can we OCR from TIFF?

ocr
scraping
studio

#1

Curious because I haven’t seen it in the user guides or demos. Is it possible to OCR/scrape from TIFF files?

I’ve seen the demo videos screen scraping a Citrix window, which I guess is an image. Would that same process work on a TIFF? If the anchor/data value I seek isn’t present on the first page, I presume I could do some kind of loop to scroll down the image and check again?

Our business receives multi-page printed documents that have been scanned as TIFF, and I need to read data values from them (could be on any page within the TIFF) and compare against other systems.

If the tool doesn’t do it out of the box… are there any integration options that might work?


#2

Hi @octechnologist

As TIFF being images only but with better quality when compared to the counterpart(jpeg), can be scrapped using UiPath as in the end it’s an image.

I do not have a lot of examples to test for .tif images but I tried with the one in my repo and it seems to give a decent output.

I have enclosed the result for your reference.

Regards

Rajat


#3

Thanks. What was the process you used to accomplish this?


#4

Hi

I used Microsoft OCR to scrape the data.

Please revert if any more details required.

Regards

Rajat


#5

Hi,
i can confirm it is possible to process TIFF files using - load image activity -> OCR activity.

The problem is, Tiff file can have more pages. During load image activity, only first page is processed. I cant see any attribute for this.
Do you have idea how to split tiff file to single pages using uipath activities and no coding around?


#6

You can use UI automation to extract text from each page in your Tiff file.
Open the Tiff file in Windows Photo Viewer, or any other photo viewer that works, use the Screen Scraping wizard to get the text using OCR. Then go to next page using a Click activity and repeat the Screen Scraping.
Of course, you can use a loop activity to do this for each page in your Tiff file.

Or you can use an external application to split the Tiff file, this one for example - http://tiffsplitter.codeplex.com/.


#7

Thanks Silvu,
I managed to read different frames of image using Invoke Method activity. Issue is Solved.

See solution attached.
OCR_Tiff.xaml (10.9 KB)