I need a little help please. The data I have scraped from various sources looks like the following sample, although I have many more columns (mostly strings, some integers and financial data). The following paints a picture of the battles I face to clean up the low duality data I’m scraping from.
As a CSV the data I have collected looks like:
Resource, Office, Funding, Field, ContactHours, Status, LunchFund Barney Rubble, G6, "$152,000", STEM, 25, Current, $55.00 Ted Fanail, D12, 32800, Art, 11, Current, -14.20 Nancy Drew, G15, "$27,500", English, NA, NA, -
Anywhere in a string I need to remove the “$”, and the “,” so all the dollar values can be treated as numbers in a downstream external process.
If the entire data for the field is “NA” or “-” then I need to remove it. In both cases I’d prefer to have null values. I need to do this without impacting the fields where those characters are part of the field (“Nancy” must not become “ncy”, negative numbers must not become positive numbers).
The outcome of the data cleanse must be:
Resource, Office, Funding, Field, ContactHours, Status, LunchFund Barney Rubble, G6, 152000, STEM, 25, Current, 55.00 Ted Fanail, D12, 32800, Art, 11, Current, -14.20 Nancy Drew, G15, 27500, English, <null>, <null>, <null>
I am not an experienced developer, so please treat me like a beginner with your responses. Thank you