Can we OCR from TIFF?

Curious because I haven’t seen it in the user guides or demos. Is it possible to OCR/scrape from TIFF files?

I’ve seen the demo videos screen scraping a Citrix window, which I guess is an image. Would that same process work on a TIFF? If the anchor/data value I seek isn’t present on the first page, I presume I could do some kind of loop to scroll down the image and check again?

Our business receives multi-page printed documents that have been scanned as TIFF, and I need to read data values from them (could be on any page within the TIFF) and compare against other systems.

If the tool doesn’t do it out of the box… are there any integration options that might work?

Hi @octechnologist

As TIFF being images only but with better quality when compared to the counterpart(jpeg), can be scrapped using UiPath as in the end it’s an image.

I do not have a lot of examples to test for .tif images but I tried with the one in my repo and it seems to give a decent output.

I have enclosed the result for your reference.

Regards

Rajat

2 Likes

Thanks. What was the process you used to accomplish this?

Hi

I used Microsoft OCR to scrape the data.

Please revert if any more details required.

Regards

Rajat

Hi,
i can confirm it is possible to process TIFF files using - load image activity -> OCR activity.

The problem is, Tiff file can have more pages. During load image activity, only first page is processed. I cant see any attribute for this.
Do you have idea how to split tiff file to single pages using uipath activities and no coding around?

You can use UI automation to extract text from each page in your Tiff file.
Open the Tiff file in Windows Photo Viewer, or any other photo viewer that works, use the Screen Scraping wizard to get the text using OCR. Then go to next page using a Click activity and repeat the Screen Scraping.
Of course, you can use a loop activity to do this for each page in your Tiff file.

Or you can use an external application to split the Tiff file, this one for example - http://tiffsplitter.codeplex.com/.

1 Like

Thanks Silvu,
I managed to read different frames of image using Invoke Method activity. Issue is Solved.

See solution attached.
OCR_Tiff.xaml (10.9 KB)

2 Likes

Hi Rajat,
Thanks a lot for sharing your experience.
Even i have the same question. your answer had really helped me.
can you please explain in detail how to process TIFF file data.
if possible can you please share the xaml file if available please.
that would really help me.
Thanks in advance

Hi @nanilkumar

Welcome to the community :partying_face:

Good to hear that the solution helped you. For the XAML file please refer the answer just above your post (THIS)

Let me know if any further details required and convey your regards to @Ada_CZ as he is the one who posted the xaml.

Regards
Rajat Pandey

Hi @octechnologist ,
You can use ImageMagick for splitting tiff files, it’s very easy and can run as command line. also you can store the output images in any of the formats you want.

convert input.tif output-%d.tif

after splitting use any of the ocr like Microsoft ocr.

Thanks a lot Rajat Pandey…
i will try and come back to you… :laughing:

1 Like