Screen Scrapping Method USE HTML

I am doing screen scrapping from E Paper ( https://epaper.dawn.com/ ). But i am facing the issue is that, here have more than 1 container in each page if i will select all then show only one container. But the extracting time. How Can i use the HTML of this E Paper containers.

Please help me out to extract the data from this epaper.

Thank in advance.

@arsalanrasheed143

Can you be a little clear and elaborate on what you want to extract and what exact issue are you facing?

Are you having trouble identifying the elements? Or is it that you are not able to indicate the elements itself?

Cheers

Hi Anil G,
Actually once i am trying to extract the first page of this e paper. I am not extracting it because this page have multiple containers. So in this scenerio i use click activity and click it only 1 container and open it then i am extracting the each container. But i need all containers at the same time.
Plz open this link of epaper which i have mentioned. You would see that this epaper have multiple images. Let me know how can i extract the page in this scenerio.

Plz respond
Thank u

Hi,

The target page uses clickable map. So, we can get URLs of each containers using FindChildern activity. Does this help you?

Sample20230211-3L.zip (3.5 KB)

Regards,

Hi,
Sorry i don’t understand that, the target page uses clickable map.. Where i need to click when i am using Find activity. and then Get attribute.

Please explain or provide me this proper process so i can understand each step.

I need each children activity and then to use this urls i need complete extract of each url. and paste in text file.

Thank you in advance.

Hi,

Clickable map or Image map is one of HTML features. We can get url from href attribute in Area element.

Did you try the above attached sample?

Regards,

Can you please share the above attached you have shared with me list of url how you get that.

Hi,

Please set Selector property blank if use uielement.

image

In the above sample, listURL has all the url string. Please check it using Breakpoint then we can see it at Locals panel.

Regards,

Hi,

Can you share your current project as zip file?

Regards,

Sample20230211-3Lv2.zip (4.2 KB)

For now, I modified selector and filter of FindChildren.

And add Get OCR Text because the content is not text but image. It’s necessary some tuning for your requirement.

Regards,

Hi,

Selection is as the following. This returns iframe.

image

And it’s necessary to create filter selector string by ourselves.
In this case, we need to investigate structure of this page from html source and/or UiExplorer. Then we can write filter string to extract Area tag, because this page is clickable map(image map) which uses Map element and Area element and url is included in Area tag.

image

Regards,

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.