Replacing strings and characters within strings in a datatable

DaveF · August 16, 2020, 11:09pm

I need a little help please. The data I have scraped from various sources looks like the following sample, although I have many more columns (mostly strings, some integers and financial data). The following paints a picture of the battles I face to clean up the low duality data I’m scraping from.
As a CSV the data I have collected looks like:

Resource,      Office, Funding,    Field,   ContactHours, Status,  LunchFund
Barney Rubble, G6,     "$152,000", STEM,    25,           Current, $55.00
Ted Fanail,    D12,    32800,      Art,     11,           Current, -14.20
Nancy Drew,    G15,    "$27,500",  English, NA,           NA,      -

Anywhere in a string I need to remove the “$”, and the “,” so all the dollar values can be treated as numbers in a downstream external process.

If the entire data for the field is “NA” or “-” then I need to remove it. In both cases I’d prefer to have null values. I need to do this without impacting the fields where those characters are part of the field (“Nancy” must not become “ncy”, negative numbers must not become positive numbers).

The outcome of the data cleanse must be:

Resource,      Office, Funding,   Field,   ContactHours, Status,  LunchFund
Barney Rubble, G6,     152000,    STEM,    25,           Current, 55.00
Ted Fanail,    D12,    32800,     Art,     11,           Current, -14.20
Nancy Drew,    G15,    27500,     English, <null>,       <null>,  <null>

I am not an experienced developer, so please treat me like a beginner with your responses. Thank you

Yoichi · August 17, 2020, 12:43am

Hi,

There are some ways to achieve it. I’ll attach a sample using string manipulation as the following.
This sample handle whole the text, but you can also do it each item in the table with theses expressions.

Sample20200817.zip (19.4 KB)

Hope this helps you.

Regards,

DaveF · August 17, 2020, 1:09am

That works beautifully on the sample data. Thank you @Yoichi.
How do you think that will go if the data table is 10000 rows by 40 columns of the same sort of data (although most of it is result data, which looks a lot like financial data, e.g. ####.##) ?

Yoichi · August 17, 2020, 1:36am

Hi,

Are you concerned about amount of data?
Although too much data causes problems, I think it will work If the data is about 40 columns by 10000 rows.

Regards,

DaveF · August 17, 2020, 2:38am

A related question, as I’m adding to the datatable as I go, then writing it out to a CSV every loop, is there a way to just cleanse the current row before writing the CSV rather than the full/growing datatable each time?

Yoichi · August 17, 2020, 2:58am

Hi,

Can you share specific sample/data?

Regards,

DaveF · August 17, 2020, 4:00am

Same data to the sample above. I’m just wondering if it would be more resource efficient to cleanse the current single row of data rather than the entire dt because the dt will eventually grow to thousands of rows. Is there a way to do that without having to cleanse each individual piece of data/variable before adding it to the row though?
It’s not mandatory for the solution you provided to work for me, I’m just wondering.

Yoichi · August 17, 2020, 10:15am

Hi,

Thank you for your reply.
I got it.

For example, we can choose one of the following, depending on the conditions.

If we want to handle it as string and output CSV file, we can cleanse additional data and then use Append to CSV activity.
If we want to handle it as datatable and output datatable, we can cleanse additional data and then use Merge DataTable activity (in case of multiple rows) or Add Data Row activity (in case of single row).

Regards,

system · August 20, 2020, 10:15am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is there a simple way to remove a string from every column in a row in a DT Help datatable , studio , data_scraping , question	7	2811	August 16, 2020
Removing Strings from DataTable and putting Numeric all in one column Studio	3	662	May 19, 2022
How to smartly remove some special characters and some strings (if those exist) from column data in a datatable? Help	3	3416	July 18, 2019
I want to remove columns with blanks and certain characters in my data table Academic Alliance datatable , studio , question	3	1410	March 3, 2022
I want to remove columns that contain blanks in the data table and values less than 100 StudioX datatable , activities , studio , studiox , question	3	985	March 1, 2022

Most Active Users - Yesterday
Anil_G
ashokkarale
Ajay_Mishra
Gautham_Pattabiraman
BHUSHAN_NAGAONKAR1
vrdabberu
ABHIMANYU_THITE1
lrtetala
samantha_shah
shyamala_shyamu
More details...

Replacing strings and characters within strings in a datatable

Related Topics