Replacing strings and characters within strings in a datatable

I need a little help please. The data I have scraped from various sources looks like the following sample, although I have many more columns (mostly strings, some integers and financial data). The following paints a picture of the battles I face to clean up the low duality data I’m scraping from.
As a CSV the data I have collected looks like:

Resource,      Office, Funding,    Field,   ContactHours, Status,  LunchFund
Barney Rubble, G6,     "$152,000", STEM,    25,           Current, $55.00
Ted Fanail,    D12,    32800,      Art,     11,           Current, -14.20
Nancy Drew,    G15,    "$27,500",  English, NA,           NA,      -

Anywhere in a string I need to remove the “$”, and the “,” so all the dollar values can be treated as numbers in a downstream external process.

If the entire data for the field is “NA” or “-” then I need to remove it. In both cases I’d prefer to have null values. I need to do this without impacting the fields where those characters are part of the field (“Nancy” must not become “ncy”, negative numbers must not become positive numbers).

The outcome of the data cleanse must be:

Resource,      Office, Funding,   Field,   ContactHours, Status,  LunchFund
Barney Rubble, G6,     152000,    STEM,    25,           Current, 55.00
Ted Fanail,    D12,    32800,     Art,     11,           Current, -14.20
Nancy Drew,    G15,    27500,     English, <null>,       <null>,  <null>

I am not an experienced developer, so please treat me like a beginner with your responses. Thank you

1 Like

Hi,

There are some ways to achieve it. I’ll attach a sample using string manipulation as the following.
This sample handle whole the text, but you can also do it each item in the table with theses expressions.

Sample20200817.zip (19.4 KB)

Hope this helps you.

Regards,

1 Like

That works beautifully on the sample data. Thank you @Yoichi.
How do you think that will go if the data table is 10000 rows by 40 columns of the same sort of data (although most of it is result data, which looks a lot like financial data, e.g. ####.##) ?

1 Like

Hi,

Are you concerned about amount of data?
Although too much data causes problems, I think it will work If the data is about 40 columns by 10000 rows.

Regards,

A related question, as I’m adding to the datatable as I go, then writing it out to a CSV every loop, is there a way to just cleanse the current row before writing the CSV rather than the full/growing datatable each time?

Hi,

Can you share specific sample/data?

Regards,

Same data to the sample above. I’m just wondering if it would be more resource efficient to cleanse the current single row of data rather than the entire dt because the dt will eventually grow to thousands of rows. Is there a way to do that without having to cleanse each individual piece of data/variable before adding it to the row though?
It’s not mandatory for the solution you provided to work for me, I’m just wondering.

Hi,

Thank you for your reply.
I got it.

For example, we can choose one of the following, depending on the conditions.

  • If we want to handle it as string and output CSV file, we can cleanse additional data and then use Append to CSV activity.
  • If we want to handle it as datatable and output datatable, we can cleanse additional data and then use Merge DataTable activity (in case of multiple rows) or Add Data Row activity (in case of single row).

Regards,

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.