Count the no. of unique value present in a column in an excel

please share the complete requirements always on begin of request with us

give a try on following doing it all:

dtResult is empty has the 4 cols: ColName, FilledRows, FilledPercentage, DistinctCount

(

(From c In varInputSapDataTable.Columns.Cast(Of DataColumn)
Let cn = c.ColumnName
Let fr = varInputSapDataTable.AsEnumerable.Where(Function (x) Not (isNothing(x(cn)) OrElse String.IsNullorEmpty(x(cn).toString.Trim))).Count
Let p = (fr/varInputSapDataTable.Rows.Count*100).toString(“F2”)
Let cnt= varInputSapDataTable.AsEnumerable.Select(Function (r) r(cn).toString.Trim).Distinct().Count
Order By CDbl(p) Descending
Let ra = New Object(){cn, fr,p, cnt}
Select row=dtResult.Rows.Add(ra)).CopyToDataTable

for an updating the Unique Count col afterwards we can think about:

  • create a dictionary with key: ColName and value distinct count
  • loop over the result table and fill the Unique Count col with the help of the dictionary using the col named from looped row

sample_MARA.xlsx (10.6 KB)

this is the sample input file

Final Report.xlsx (7.9 KB)
this is the sample output corresponding to the input shown above

in this, in the first table, you can see I have a column called unique count
in that particular count, I need the unique count data

the steps performed till now on that table are

  1. build the table with the columns

used the build data table activity

  1. Read the input excel using read range activity and store the value in the variable:
    varInputSapDataTable

  2. get the headers of the table and calculate the filled %

used a query
assign activity
LHS: in_varDataTableForColumnHeadersAndFilledRate
RHS: (From c In varInputSapDataTable.Columns.Cast(Of DataColumn)
Let fr = varInputSapDataTable.AsEnumerable.Where(Function (x) Not (isNothing(x(c)) OrElse String.IsNullorEmpty(x(c).toString.Trim))).Count
Let p = (fr/varInputSapDataTable.Rows.Count*100).toString(“F2”)
Let ra = New Object(){c.ColumnName, fr,p}
Select r=in_varDataTableForColumnHeadersAndFilledRate.Rows.Add(ra)).CopyToDataTable

  1. sort the data table so obtained in descending order based on filled %

used a query
assign activity
LHS: in_varDataTableForColumnHeadersAndFilledRate
RHS: in_varDataTableForColumnHeadersAndFilledRate.AsEnumerable.OrderByDescending(function(r) Double.Parse(r(“Filled %”).ToString)).CopyToDataTable()

  1. that I need to do now is add the unique count

so now i am on the 5th step

why not doing all within 1 LINQ as shared above?

It is giving error of end of expression required

its fixed now above and also validating:
grafik

the unique count is not coming correct on huge data
its a few values up or down for some

EXPECTED OUTPUT:

IFLOT Decription Filled Rows Filled % Unique Count
TPLNR Functional Location 130128 100 99352
TPLKZ Functional Location Structure Indicator 130128 100 11
FLTYP Functional Location Category 130128 100 1
ERDAT Date on which object was created 130128 100 2252
IWERK Maintenance Planning Plant 130128 100 6
ILOAN Location and account assignment for technical object 130128 100 99352
OBJNR Object number 130128 100 99352
STRNO Functional Location Label 130128 100 130128
TPLMA Superior Functional Location 129178 99.27 4913
RBNR Catalog Profile 88803 68.24 140
HERST Manufacturer of Asset 30816 23.68 1866
TYPBZ Manufacturer Model No 26616 20.45 6482
SERGE Manufacturer Serial No 8955 6.88 5858
SUBMT Construction Type material of the object 1149 0.88 583
MAPAR Manufacturer Part No 511 0.39 241
EMATN Material Number Corresponding to Manufacturer Part Number 0 0 0

CURRENT OUTPUT:

IFLOT Filled Rows Filled % Unique Count
TPLNR 130128 100.00 99352
TPLKZ 130128 100.00 11
FLTYP 130128 100.00 1
ERDAT 130128 100.00 2252
IWERK 130128 100.00 6
ILOAN 130128 100.00 99352
OBJNR 130128 100.00 99352
STRNO 130128 100.00 130128
TPLMA 129178 99.27 4914
RBNR 88803 68.24 141
HERST 30816 23.68 1919
TYPBZ 26616 20.45 6499
SERGE 8955 6.88 5859
SUBMT 1149 0.88 584
MAPAR 511 0.39 241
EMATN 0 0.00 1

keep in mind we had trimmed the values and not filtered out empty values from the count
how did you cross checked?

i had this report as a sample from my seniors, and the input data was same

if we filter out the empty values will that make it right?

you would know it. Have a look on this LINQ:

(From c In varInputSapDataTable.Columns.Cast(Of DataColumn)
Let cn = c.ColumnName
Let drFilled = varInputSapDataTable.AsEnumerable.Where(Function (x) Not (isNothing(x(cn)) OrElse String.IsNullorEmpty(x(cn).toString.Trim)))
Let cntFilled = drFilled.Count
Let p = (cntFilled/varInputSapDataTable.Rows.Count*100).toString(“F2”)
Let cntDistinct= drFilled.Select(Function (r) r(cn).toString.Trim).Distinct().Count
Order By CDbl(p) Descending
Let ra = New Object(){cn, cntFilled, p, cntDistinct}
Select row=dtResult.Rows.Add(ra)).CopyToDataTable

Still not correct i checked manually this time
the unique values of one column are 1862
but it is showing as 1918

you can analyse it within immediate panel ad hoc.
you can check the LINQ part with a reduced set of data

we cannot rate if the numbers has to be matched or not

just take your time

actually I am not able to understand the linq part

can we write a linq
where we have a temp data table
and then we remove duplicates for a particular column and then count the remaining rows and that would be the unique values

can you please send a sample xaml for this as I am unable to understand it @lakshman
it is giving the following error

how can i do it @kalyanDev

@Aishwarya_Bhargava

Sorry. Pass ArrayRow as below.

           {item.ColumnName, InputDT.DefaultView.ToTable(True,item.ColumnName).Rows.Count.ToString}

its not giving the correct output

As you reported to different approaches that it is not giving an expected output we would suggest to analyze it within an isolated / standalone RnD xaml.

  • readin excel
  • explore different approaches on this for getting the count you are looking for.

in case of you cannot ensure the validity of the given counts, then you can also check your implementation approaches by runnong testcases with a samller / dedicated test set.

In case of you can share a reduced test set with us we can also help you on the rnd/prototypes