Replace Blank\Null values on datatable

Hi,
I have a large Excel file (65 columns) which i would like to upload to a Sharepoint list using ‘Add list items’.
I’m reading the Excel into a datatable and then using ‘add list items’ - It works when there are no blank values on the dataset.

when the original excel has blank cells - adding to the sharepoint list fails.

I tried using Linq to replace all blank values with [something] but my problem is that the blank values can be on a String column, an int one, or a date one.

this means i need to change the below query to say:
If the blank cell is on a string col >> Replace with “NA”
If the blank cell is on an int col >> Replace with 0
If the blank cell is on a date col >> Replace with a date ‘1/1/1900’

This is the code I’m currently using:
(From d In MyListToUpload2.AsEnumerable()
Select ia=d.ItemArray.ToList
Let ra = ia.ConvertAll(Function (x) If(isNothing(x) OrElse String.IsNullOrEmpty(x.toString), “na”, x)).toArray()
Select rc = dt_Corrected.Rows.Add(ra)).CopyToDataTable()

hi @Adi_Lupashko
use the below query to replace null with zero
syntax:

(From r In dtData.AsEnumerable
Let ra = r.ItemArray.Select(Function (e) If(isNothing(e) OrElse String.IsNullOrEmpty(e.toString), 0, e)).toArray
Select dtCorrected.Rows.Add(ra)).CopyToDataTable()

thank you! is there a way to add an “If statement” to the function?
If type is data replace with ‘1/1/1900’ otherwise replace with zero?

@Adi_Lupashko
You can convert the datetime column into string and replace it with

syntax : (From r In dtData.AsEnumerable
Let ra = r.ItemArray.Select(Function (e) If(isNothing(e) OrElse String.IsNullOrEmpty(e.toString), “1/1/1990”, e)).toArray
Select dtCorrected.Rows.Add(ra)).CopyToDataTable()

dtcorrected = dtDate.clone

@Adi_Lupashko

First thing …if it is datetime column them you wont have blank at all…may be the the column is taken as string only…you first either need to convert the column type to datatime…or later convert eqch item separately using logic which is tedeous for 64 columns

Cheers

why i won’t have blank in DateTime? this is an Excel file pulled from another system and this is how it pulls it…

I think what I’m failing to explain is that i have many columns of all types and i need a solution for all types…
Dataset Example:

EmployeeID Name Hire Date Termination Date Salary ZipCode Organization
1133 Adi 1/1/2020 10,000 656565 ABC
4422 Omer 5/3/2021 3/3/2022 15,000

The end result should be:
EmployeeID Name Hire Date Termination Date Salary ZipCode Organization
1133 Adi 1/1/2020 1/1/1900 10,000 656565 ABC
4422 Omer 5/3/2021 3/3/2022 15,000 0 NA

You can convert any row into string and replace the values you need…

@Adi_Lupashko

If whole of column is formatted as datetime then it would read the empty value as the least date which is what you want already

After reading please check the datatypes of the columns mostly it wuld be object and not datetime

Cheers

In general we can implement a cleansing
we can check also to expoit the defaultvalue of a data column

As you mentioned:

here we have to keep in mind that read range can interpret the column types also in some scenarios different (e.g String instead of DataType).

So a cleansing, adressing all expected sceanrios can look also more complex including datacolumn datatype change

Yes, it is indeed Object. is there a way to change it?

@Adi_Lupashko

There is no direct way to change or cast the datatype o fa column which has data already

Ypu need to separately may be mention which columns are what type and accordingly write the blank values with the respective data

You can maintainthe column names as arrays and loop through the array and then use the below formula to replace as needed

For example all the columns with integer type

Arr1 - {"col1,"col2",..}

Now use for loop with arr1 asin argument

And inside loop use invoke code with dt as in/out argument

Dt.AsEnumerable.ToList.ForEach(sub(r) If(String.IsNullOrEmpty(r(currentItem).ToString),"0",r(currentItem).ToString))

Similarly for other two types as well

Cheers

Cheers

Hi miss @Adi_Lupashko , We encountered a same situation on our previous project and this is the approach that we did.

Check this xaml file for better reference:
Put Value on Blank or Empty.xaml (12.9 KB)

This is the LINQ that was used.

Assign dtOutput = (
From a In dtExcel
	Let EmployeeIDColumn = If(String.IsNullOrEmpty(a("EmployeeID").ToString) Or String.IsNullOrWhiteSpace(a("EmployeeID").ToString), "N/A", a("EmployeeID").ToString)
	Let NameColumn = If(String.IsNullOrEmpty(a("Name").ToString) Or String.IsNullOrWhiteSpace(a("Name").ToString), "N/A", a("Name").ToString)
	Let HireDateColumn = If(String.IsNullOrEmpty(a("Hire Date").ToString) Or String.IsNullOrWhiteSpace(a("Hire Date").ToString), "1/1/1900", a("Hire Date").ToString)
	Let TerminationDateColumn = If(String.IsNullOrEmpty(a("Termination Date").ToString) Or String.IsNullOrWhiteSpace(a("Termination Date").ToString), "1/1/1900", a("Termination Date").ToString)
	Let SalaryColumn = If(String.IsNullOrEmpty(a("Salary").ToString) Or String.IsNullOrWhiteSpace(a("Salary").ToString), "0", a("Salary").ToString)
	Let ZipCodeColumn = If(String.IsNullOrEmpty(a("ZipCode").ToString) Or String.IsNullOrWhiteSpace(a("ZipCode").ToString), "0", a("ZipCode").ToString)
	Let OrganizationColumn = If(String.IsNullOrEmpty(a("Organization").ToString) Or String.IsNullOrWhiteSpace(a("Organization").ToString), "N/A", a("Organization").ToString)
Select dtOutput.Rows.Add(EmployeeIDColumn, NameColumn, HireDateColumn, TerminationDateColumn, SalaryColumn, ZipCodeColumn, OrganizationColumn)	
).CopyToDataTable

As you can see, you might do this with the 63 columns you mentioned, it will be time consuming but I think it will do the requirement you’re looking for.

Sample output from the table example you presented.
image

Thanks! this looks like what i need but dont think it will be efficient to do it for 65 columns…

Do you think i can do this only for one\two date columns and then keep all other columns? instead of selecting one by one? If i take your code for the date columns, and then do something else to replace all the numbers and strings - it might be more efficient.

(
From a In dtExcel
Let TermDateCol = If(String.IsNullOrEmpty(a(“Column48”).ToString) Or String.IsNullOrWhiteSpace(a(“Column48”).ToString), “1/1/1900”, a(“Column48”).ToString)
Select dtOutput.Rows.Add(.TermDateCol) [The "" here is obviously not working…]
).CopyToDataTable

you can dynamize it e.g. by a lookup dictionary and selecting a default value based on the data column datatype.

But keep in mind as also mentione above. There are more disturbing factors to expect and we would suggest to setup a dynamic approach along with a driving data schema

Try this reference:

I applied it too on the example you provided:
Put Value on Blank or Empty.xaml (18.4 KB)

But I wouldn’t necessarily advise a solution that uses For Each or Loop especially when you are processing hundreds of thousand of transaction rows and as you mentioned 65 columns in a Data Table. It might cause a longer processing time and worse case a ‘job stopped’ as we experienced from our previous projects. But if you are only dealing with less than a 100,000 transaction rows, I think this approach is still safe.

As also mention subsequent, the LINQ is invalid. The Where Operators requires a Lambda with a Boolean outcome

to illustrate it more within a prototype:

we got input:
grafik
yellow marks are empty values
we do have a structure of:

Flow:
grafik

Preparing the default LookUp Dictionary

New Dictionary(Of Type, Object) From {
{GetType(String),"NA"},
{GetType(int32), 0},
{GetType(DateTime), New DateTime(1900,1,1)}
}

cloning the dtData:

dtResult = dtData.Clone

Filling up:
dtResult =

(From d  In dtData.AsEnumerable()
Let ra = d.ItemArray.Select(Function (x,i) If(isNothing(x) OrElse String.IsNullOrEmpty(x.toString().Trim()), dictLKDefaults(dtData.Columns(i).DataType), x)).toArray()
Select r = dtResult.Rows.Add(ra)).CopyToDataTable

Result:
grafik

This prototype can also be advanced to:

  • by forcing schema definitions and conversions
  • decomposition to essential activities

We would not deny to use also for each / Parallel For each activities, as the general processing costs are to rate and not the usage of a for each activity

1 Like

Hi,

dtOutput is a Datatable? or anything else?
I’m getting an error that it cannot be converted to a datatable…

image

I think you’re missing a “(” before “From”

Yes, dtOutput is a Data Table

I also noticed that you are processing same Data Table on the LINQ. You have to use different Data Table for the Output, if you don’t I think you will get an error.

You might want to follow this format where dtOutput is the Output Data Table and dtExcel is the Input Data Table

You should initialize the dtOutput by Cloning the dtExcel or your data table input (put it after the reading of the excel file you are processing)
image