How to scrape all data from a web container that has a scroll down

datascraping
string

#1

Hi all,
I want to read the data contained in the inner window into a string variable and parse through the string for the Avg column values. I tried using Get Text activity , which read the entire text but did not save the newLine formatting. As a result, I was not able to split the string on newLine. Is it possible to get the data using other scraping methods

If I manually select the text and copy it into a file and then use that data, the newLine is recognised and I’m able to split the string into lines, but Get Text activity returns a string which I’m not able to parse.

It would also be nice if i could get the text by simulating drag and select by mouse and Ctrl+c . But since there is a scroll down , I’m not able to record the drag and select.

The data is dynamic and can have any no of lines

Thanks for the help !


#2

@venkat97 Hello! Try using sceen scraping method with full text or native as scraping type and generate the table.


#3

@Niket_Ghai I tried that. The text is extracted . I tried printing it and it works fine. But I’m not able to split the string using newline .I even tried seeing if it contains vbCrLf and it returned -1. But when i print it to output , it does print the newline .
Look at the image below

Here I split the string using space (Remove empty entries) . It does show a newline between (14.140.109.241 and HOST : ) as it should . But when I try to search for the newline or split using newline it deosn’t work. Its pretty weird.


#4

Please provide a link of the website please :slight_smile:


#5

Actually the website cant be accessed from outside. I have provided the notepad file into which I wrote the scraped data ( as a string ) . As you will see , notepad also does not display new line. But you can find that there is some hidden character between (14.140.109.241 and HOST : ) I just want to be able to somehow recognize that newline character so that i can split the string .Newline.zip (320.8 KB)

Also , the data is dynamically generated , and might be different from what you saw in the initial image. but the format is the same


#6

Hi Venkat
I have tried to split the text on the basis of new line characters and I am getting output as in the image attached


If you need output in this format you can refer this workflow.
Split.xaml (5.5 KB)


#7

@Bharat Thank you It works! What I initially tried was text.Split({Environment.newLine},StringSplitOptions.RemoveEmptyEntries) which didnt work

But your method works. Can you also suggest how to split by an empty line (between two paragraphs) . I tried using 2 newLines .