Can't get words position using Microsoft OCR and Read PDF with OCR

pdf
ocr
word
studio
microsoft

#1

Hello.

I’m trying to do the following thing. I have a scanned document in the pdf format. I use “Read PDF with OCR” activity plus Microsoft OCR. There is a possibility of extracting KeyValuePair having used the Microsoft OCR thing. I store it in a variable and then generate the data putting the variable to “input” -> “positions”. When I use “Output Data Table” to see the result, it contains only one column with the extracted pdf text. There are no each word positions. Is it possible to fix it?


#2

Can you share your xaml and the pdf


#3

Yeah, sure I’ll do it in 5-6 hours, just don’t have an access to my computer right now.


#4

@arathi Here it is.
Main.xaml (11.3 KB)
109970.pdf (502.7 KB)


#5

hi this is what I am getting after reading the pdf. I have tried using language “rus” and “russian”. Let me know :slight_smile: whats the result for you :slight_smile:

Good day :slight_smile:


#6

Hey :-). Thank you for your attempt. My results are a bit better if I use “Russian” with scale range of 0.7-1. But still I can’t get the word positions.