How to cleanse data in a dt row before writing to CSV

While scraping data I am getting some bad data that needs cleansing. When it happens, there is a lot of it.

My script builds up a dt row over the course of several web scrapes, then at the end of the loop writes to a .CSV file before starting the scrape process again. I do it this way so progress is not lost on crash/recovery.

The file it writes to is very, very large, so for resource efficiency sake I don’t want to try to cleanse the whole data table or the CSV every loop. What I’d prefer to do is to cleanse just the latest dt row between the last update to the dt row and the write of the dt to CSV.

// Last dt row update. Mark as Done
dtASlist.Rows(dtASlist.Rows.IndexOf(row))(“GotAllData”) = “Done”

// Write updated DT to the CSV
Write dtASlist (include headers) to strFilePath_AScsv

Between those two lines, what can I add to clean the dt row of bad data without impacting the good data.

dt row with bad data example:
83.06%, $14.32, ###, ---, -, $11.30, -9.57, -3.81%, -$0.05

Need to end up with:
83.06, 14.32, , , , 11.30, -9.57, -3.81, -0.05

I need to remove “-” and “–” only if it is the entire content of the column (i.e. The cleanse must not change a negative integer into a positive integer), I also need to remove dollar signs and percent signs where ever they appear

How can this dt row data cleanse be done? Please provide code/screenshots as I’m inexperienced with UIpath. Thanks

Hi @DaveF

Here is the workflow.
DataTableCleanup.xaml (13.4 KB)

It’s a simple approach:

  1. Remove all the (known) special characters with Replace method.
  2. Check the column value against a regular expression that starts with - but also has a digit in it. This would only match any negative values, but not the columns where only “-” character exists. Then remove the unnecessary “-” columns and you get your output.

Tweak it as necessary for more special characters or different regular expression combos.

Any questions are welcome. Happy automation!

2 Likes

That looks great. Thank you.
Checking the regex on https://regexr.com/ and I can’t see what it does in respect to the data I have. Are you able to explain that section please?

@DaveF

I prefer regex101.com to test my regular expressions, because it has this Explanation section which spells out what the regex is doing.
I am a noob myself when it comes to regex, so this website is quite helpful.
image

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.