How "get text" technically works?

hello everyone,

i am just looking to know how technically the get text or even the screen scrapping works ?
i looked on the net but i didn’t find anything about it.

does it run throught the nodes of the web page or does it “scan” the image using an algorithm with greyscaling and other things ?
i think it is the second option because an image does have nodes but maybe i am wrong

thanks

Hi buddy @grish

Fine, thats a good question

–Get text
This actually works only when the element from where we want to get the text in it should be accessible as a individual element, it should be able to selected as a element, that implies that it wont work for images, only selectable elements,
As said so, when a element is selected with GET TEXT activity, it will generate or rather say will fetch the XML nodes of that element, with lots of attributes and values, along the nodes in it, where we will be given the output of either the attributes NAME or TEXT
as a string datatype variable. This is how it works

THEN

–ScreenScrapping
As the name implies SCRAPPING, it just scrapes the element and get the text in it, it works similarly get text only if the elements are selectable, then a question may arise why we need this as a separate activity still we a have called get text, There comes the difference, Get Text cannot be used for image, while this SCREEN SCRAPPING can be used for all the types of element with Full Text, Native Text and OCR Text.
While OCR text is what image extraction, it will scrape the image that we mention and try to get the text in it with OCR Technology.

For more details on this screen scrapping with these three types,

Hope this would help you
Kindly revert for any queries or clarification
Cheers @grish

2 Likes

thanks man !
i understand this much better :+1:

Fantastic
Cheers @grish

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.