How to Scrape data, numbers only ? (5$ for working answer)


#1

Hello dear fellow users. Im having a big problem (For me) and i need fast replies. So, im scraping data from this site and i want my data in format 1234567-8. So i wanna get rid of every line of text!
Here is the result i get:
https://aijaa.com/AvVHuu

Here is the data preview:
https://aijaa.com/rzVkKh

Should i edit my data defination or what?
Here it is:

<extract>
	<column exact="1" name="Column1" attr="text">
		<webctrl tag="div" class="SearchResultList SearchResultList--advertisement SearchResultList--done" idx="1"/>
		<webctrl tag="div"/>
		<webctrl tag="div" class="SearchResult__MainCol" idx="1"/>
		<webctrl tag="div" class="text-muted"/>
	</column>
</extract>

For working answer, i will pay 5$ with paypal!


#2

Hi.

If you have a string that you scraped…
To remove alphas and specials from a string, there are a few ways:

String.Concat(text.Where(AddressOf Char.IsDigit))
or
System.Text.RegularExpressions.Regex.Replace(text, "[^0-9]", "")

to keep . and -
System.Text.RegularExpressions.Regex.Replace(text, "[^0-9.-]", "")

If you would like to pull out your number in a different way like using a pattern for example, we would need example text you are scraping.

(fyi, your image is broken and won’t load for me)

I hope this helps in some way.

Also, if this answers it, you can keep the $5 :stuck_out_tongue:

Regards aletzi


#3

To clarify – You are wanting :

Datatable which only contains data in “#######-#” format

correct? – Don’t use that data scraping xml to achieve this.

  • For each - Row – in datatable
  • Assign var = datatable(x)(y).tostring
  • Check if var matches the #######-# format (lmk if you need example)
    • if true - do nothing or add to final list/array/datatable
    • if false - remove datarow or don’t add to final list/array/datatable

#4

I have generated datatable, which scrapes the data shown in picture https://aijaa.com/AvVHuu
The goal is to only return the values which are in format 1234567-8 and cut out all the text and return the specified format to excel file and then the goal is to search with these results on a another website using for each row. The search in another site work with name of the company, but we need to change this name search to 1234567-8 for specific reasons

How can i use this: System.Text.RegularExpressions.Regex.Replace(text, “[^0-9.-]”, “”) in my workflow?


#6

You can pretty much use this anywhere like in an Assign activity or condition or message box

I put a random string in there, which can be a variable (like from the web or data table).
The Regex Replace will remove everything except the numbers and dash. Depending on what you want to extract though, you may need an adjustment to the pattern.

I tested the above example in the message box and it displayed only “1234567-8”

(also, the image still won’t load for me; it might be because I’m on a company proxy that blocks it)

Regards.


#7

Dear Clayton, we are very close now, thank you for that!
We are importing our result (1234567-8) to Excel sheet before doing another search on another site. Is it possible? Here is the result what we get now. I think i was asking wrong questions…

Your code, System.Text.RegularExpressions.Regex.Replace(text, "[^0-9.-]", "") worked almost like i wanted! :slight_smile:

BR Aletzi


#8

Yes, it is possible.

What you will want to do is Assign the value to the item in each row. So, if you are using a ForEach row activity, then in an Assign activity it would look like this:
Assign row.Item(“Yritys”) = “1234567-8”
(or row.Item(0) if you don’t want to use column name)

The next thing I will mention is that you will want a different Regex pattern I think. You will want one that only looks for numbers from 4-8 digits followed by a - and a digit.
The pattern would be “[0-9]{4,8}\-[0-9]”

Then, let’s change it from .Replace to .Match().Value

With those changes you will have it like this:

If you change the string to a variable or row.Item(“Yritys”).ToString, it will be like this:
image

The variable can be the text from the web or the item in the data table (row.Item(“Yritys”).ToString)

I hope I am clear.

Regards.