Checking special characters in column values in data table


#1

Hi,
I have an excel sheet that contains a column called semployerName. I want to check whether each row under this column contains a * or a ’ character such that
if the character is found i can change the name otherwise leave the name as such. How can i do this ?
i know for each is an option but the excel contains records > 50k so i do not think this will be feasible .

Thanks in advance


#2

Hi,

As far as I know, you can’t manipulate dataTables with LINQ or other efficient methods, so ForEach is only way (if you use it as a Data Table). However, I know you can do this if you convert the table to a string because then you can use the Select function to piece the string together per row. It’s probably not an ideal solution, but does work.

Here is an example of what I mean…

1. Output Data Table to string (let’s call “strTable”)

2. Use vb.net expression to query array.
strTable = String.Join(System.Environment.Newline,strTable.Split(System.Environment.Newline(0)).Select(Function(row) If(row.ToString.Split(",“c)(0).Contains(”*") or row.ToString.Split(",“c)(0).Contains(”’"),row.ToString.Split({row.ToString.Split(",“c)(0)},System.StringSplitOptions.None)(0)+newname+row.ToString.Split({row.ToString.Split(”,"c)(0)},System.StringSplitOptions.None)(1),row.ToString)).ToArray)

3. Either output string to Text File with .CSV extension and Read Range back to DataTable, or use Generate Data Table activity.

Notes: I used row.ToString.Split(",“c)(0) to signify that the first column is used with (0); if you would like the semployerName column then either hardcode the column Index or use I think
dtvariable.Columns.IndexOf(“semployerName”)

.Select function I used says “go through each line in the strTable, and if value contains “*” or “’” then split string up, replacing value with your newname, and concatenating it back together… if it doesn’t contain those values then it just keeps the line as is.”

Sorry for any syntax or typo errors.

Regards!


#4

Hello,

These are a few ways to go about it:

  1. You could use LINQ on the DataTable that results from reading the excel sheet to extract only those rows that contain * or '. Use the following in an Assign activity:

dt.AsEnumerable().Where(Function(y) y.Item("semployerName").ToString.Contains("'") or y.Item("semployerName").ToString.Contains("*"))

The result will be an EnumerableRowCollection<DataRow> which you can then iterate and use String methods like Contains, IndexOf, Split, Join on each row to modify the values in the “semployerName” column as needed. Then, you can write the DataTable back into excel. This way, you only have to iterate through the rows resulted from the LINQ query and not the whole DataTable.

  1. You could also call the Select method on the DataTable, and use the “Like” operator to only get the rows that contain * or '. Inside the expression * should be enclosed in square brackets and ' should be doubled (so ''). Use the following in an Assign activity:

Capture

dt.Select("[semployerName] like '*[*]*' or [semployerName] like '*''*' ")

The result will be an array of DataRows which you can also then iterate, modify the rows and write the DataTable back into excel.

  1. Another way would be to get the excel content as a String and then manipulate it, as described in @ClaytonM’s answer.

Regarding performance, the LINQ method is faster than the Select method but not as fast as the String manipulation method. But before you discard the For Each method and start looking for other ways you should actually try it, maybe the performance is good enough for your requirements. Sometimes the most straightforward solution is also the most appropriate.


#5

thanks i got the solution