Data Scraping and Write to Excel Worksheet

dwalker · April 3, 2020, 2:04pm

I am developing an RPA tool that scrapes data from a site, manipulates the scraped text using regex and then writes back to an excel worksheet.

The above text is an example of the text I will be scraping. I have used a Get Full Text activity to select the Div that this text is under. Although the resulting text returns a lot of whitespace and special characters.

The regex I have tried using does not seem to return the desired result I am after.

Using the Replace activity I tried the following:
- Other.*

I want to remove all text after ‘- Other’

^.*?(\d{4})

I want to remove all text up to the first occurring 4 digit number (ie. 2008 in the above example)

I then used a Matches activity which has the following regex:
(-\s+)[^-]

I used this to break int lines with a dash character (-) into separate lines.

I then have an Excel Application Scope and I iterate through each item of the previous output to write back to the excel worksheet. Although the resulting text is not what I had expected the regex to return.

loginerror · April 8, 2020, 8:56am

Hi @dwalker

Is this the HTS codes you are after?

If so, there are multiple places where you can export the codes directly to an Excel file.

Check this out:
https://circabc.europa.eu/w/browse/23cc0022-41e9-4ec4-b16f-158f645eca46

It might make it easier for you.

dwalker · April 8, 2020, 9:13am

This is extremely helpful and another option for me. For example, If I am to search for Goods Code 68101190.

The description may contain more than one line with the same number of ‘-’ characters. I will, in all cases, only need to copy the last line of a number of dashes. For example, in the below screenshot, there are two rows with - - -, I will only want to include the final line. What would be a way for me to check for this and ensure that I will not get two rows like this?

Thank you

loginerror · April 9, 2020, 10:27am

Well, I suppose if your goal is to always extract the line before the Other (= thus, always the previous row), you could find out the index of the row that contains the Other and subtract 1 from it

dwalker · April 9, 2020, 11:16am

Thanks @loginerror.

I have tried 10+ different methods to the following: I need to search for the value stored in one excel in the Nomenclature excel and then copy all rows back to the original excel.

It is explained in the link below if you have the time I appreciate it.

Thanks!

Topic		Replies	Views
Data manipulation excel - extract specific text from a string Activities excel , activities , question	6	1162	April 27, 2022
Write range activity write numbers as text (') Studio datatable , activities , data_scraping , regex , question	1	1581	September 8, 2020
How to Move the Regex data into Excel file? Help	26	2596	May 31, 2019
Exrtact a part of cell in a data table Help excel , activities , question	6	712	December 14, 2020
Extract numbers only from Excel column Activities datatable , excel , activities , question	8	2664	March 24, 2021

Most Active Users - Yesterday
Anil_G
ashokkarale
sonaliaggarwal47
Yoichi
AJ_Ask
sharazkm32
Naveen_Kumar6
seki_i
manasrlenka25
Lahiru.Fernando
More details...

Data Scraping and Write to Excel Worksheet

Related topics