hi friends,
i using get ocr text for a scanned pdf to extract some text to a excel file. when it does extract the text it is not always the right text because the position of the text moves just slightly up and down for every pdf scan. how can i make it “dynamic” or make it so that it can pull the text even though it moves. this a scanned pdf by the way so please any help and directions are welcomed.
So in that case there are three options
- First if you want to get from pdf only with UI Interaction then use a SEND HOT KEY activity and use key as ctrl+f and type the near by anchor word of your text
And use another send hot key with key as enter
This will take you to the word you want and then you can get that with get ocr text
- Second method is use READ PDF WITH OCR and get the text as string
Use OmniPage ocr as engine
For that go to design tab → manage packages → all packages → search for OmniPage and install that activities package
This will give you the text and I’m you can get your text in specific with string manipulation like Split or Regex method
- Then finally we can use ANCHOR BASE activity if you want to get through UI INteraction
Where use FIND IMAGE activity and scrape the region of anchor of your text and in right side use get ocr text
We can also do that same with COMPUTER VISION activities
Give a try with that as well
being image based text it requires lot of trial and error and find the best optimal approach
Cheers @Shazid_Rahman
how can i do this for multiple pdf files
also for some reason i cant download omnipage
the first metod cant work because its a scanned pdf