Count the no. of unique value present in a column in an excel

Hi team

I want to calculate the no. of unique value each header has, so how can I do it
for example
input table:

header1 header2 header3 header4
1 q q w
2 w q e
3 e q d
4 r q w
5 t q s
6 y q e
7 q q d
8 w q r
9 e q f
10 r q t

output table:

table name unique count
header1 10
header2 5
header3 1
header4 7

so how can I do it?

use dt.defaultview(True,) method

prepare with build datatable the report datastructure (Columns: tableName, uniqueCount) - dtReport

Assign acitvity:
Left: dt Report
Right

(From x in dtData.Columns.Cast(Of DataColumn).Select(Function (c) c.ColumnName)
Let cnt = dtData.AsEnumerable.Select(Function (r) r(x).toString.Trim).Distinct().Count
Let ra = new Object(){x, cnt}
Select r=dtReport.Rows.Add(ra)).CopyToDataTable

@Aishwarya_Bhargava

  1. Use Read Range activity to read the input data from excel file and it will give output as DataTable. Let’s say ‘InputDT’.

  2. And then use Build DataTable activity and create two columns - table name and unique count. Let’s say ‘OutputDT’

  3. And then use For Each loop activity and pass ‘InputDT’ to it.

        For Each item in InputDT.Columns
               Use Add DataRow activity and pass DataTable as OutputDT and ArrayRow value as below.
    
         {item, InputDT.DefaultView.ToTable(True,item).Rows.Count}
    
  4. Finally use Write Range activity and pass OutputDT to write into Excel file.

i need to add the unique value count in this already existing table

IFLOTX Filled Rows Filled % Unique Count
TPLNR 198303 100.00
SPRAS 198303 100.00
PLTXT 197876 99.78

this table is generated using 2 queries

(From c In varInputSapDataTable.Columns.Cast(Of DataColumn)
Let fr = varInputSapDataTable.AsEnumerable.Where(Function (x) Not (isNothing(x(c)) OrElse String.IsNullorEmpty(x(c).toString.Trim))).Count
Let p = (fr/varInputSapDataTable.Rows.Count*100).toString(“F2”)
Let ra = New Object(){c.ColumnName, fr,p}
Select r=in_varDataTableForColumnHeadersAndFilledRate.Rows.Add(ra)).CopyToDataTable

and

in_varDataTableForColumnHeadersAndFilledRate.AsEnumerable.OrderByDescending(function(r) Double.Parse(r(“Filled %”).ToString)).CopyToDataTable()

and when I added your query it made the filled % vanish and unique count also didn’t come

please share the complete requirements always on begin of request with us

give a try on following doing it all:

dtResult is empty has the 4 cols: ColName, FilledRows, FilledPercentage, DistinctCount

(

(From c In varInputSapDataTable.Columns.Cast(Of DataColumn)
Let cn = c.ColumnName
Let fr = varInputSapDataTable.AsEnumerable.Where(Function (x) Not (isNothing(x(cn)) OrElse String.IsNullorEmpty(x(cn).toString.Trim))).Count
Let p = (fr/varInputSapDataTable.Rows.Count*100).toString(“F2”)
Let cnt= varInputSapDataTable.AsEnumerable.Select(Function (r) r(cn).toString.Trim).Distinct().Count
Order By CDbl(p) Descending
Let ra = New Object(){cn, fr,p, cnt}
Select row=dtResult.Rows.Add(ra)).CopyToDataTable

for an updating the Unique Count col afterwards we can think about:

  • create a dictionary with key: ColName and value distinct count
  • loop over the result table and fill the Unique Count col with the help of the dictionary using the col named from looped row

sample_MARA.xlsx (10.6 KB)

this is the sample input file

Final Report.xlsx (7.9 KB)
this is the sample output corresponding to the input shown above

in this, in the first table, you can see I have a column called unique count
in that particular count, I need the unique count data

the steps performed till now on that table are

  1. build the table with the columns

used the build data table activity

  1. Read the input excel using read range activity and store the value in the variable:
    varInputSapDataTable

  2. get the headers of the table and calculate the filled %

used a query
assign activity
LHS: in_varDataTableForColumnHeadersAndFilledRate
RHS: (From c In varInputSapDataTable.Columns.Cast(Of DataColumn)
Let fr = varInputSapDataTable.AsEnumerable.Where(Function (x) Not (isNothing(x(c)) OrElse String.IsNullorEmpty(x(c).toString.Trim))).Count
Let p = (fr/varInputSapDataTable.Rows.Count*100).toString(“F2”)
Let ra = New Object(){c.ColumnName, fr,p}
Select r=in_varDataTableForColumnHeadersAndFilledRate.Rows.Add(ra)).CopyToDataTable

  1. sort the data table so obtained in descending order based on filled %

used a query
assign activity
LHS: in_varDataTableForColumnHeadersAndFilledRate
RHS: in_varDataTableForColumnHeadersAndFilledRate.AsEnumerable.OrderByDescending(function(r) Double.Parse(r(“Filled %”).ToString)).CopyToDataTable()

  1. that I need to do now is add the unique count

so now i am on the 5th step

why not doing all within 1 LINQ as shared above?

It is giving error of end of expression required

its fixed now above and also validating:
grafik

the unique count is not coming correct on huge data
its a few values up or down for some

EXPECTED OUTPUT:

IFLOT Decription Filled Rows Filled % Unique Count
TPLNR Functional Location 130128 100 99352
TPLKZ Functional Location Structure Indicator 130128 100 11
FLTYP Functional Location Category 130128 100 1
ERDAT Date on which object was created 130128 100 2252
IWERK Maintenance Planning Plant 130128 100 6
ILOAN Location and account assignment for technical object 130128 100 99352
OBJNR Object number 130128 100 99352
STRNO Functional Location Label 130128 100 130128
TPLMA Superior Functional Location 129178 99.27 4913
RBNR Catalog Profile 88803 68.24 140
HERST Manufacturer of Asset 30816 23.68 1866
TYPBZ Manufacturer Model No 26616 20.45 6482
SERGE Manufacturer Serial No 8955 6.88 5858
SUBMT Construction Type material of the object 1149 0.88 583
MAPAR Manufacturer Part No 511 0.39 241
EMATN Material Number Corresponding to Manufacturer Part Number 0 0 0

CURRENT OUTPUT:

IFLOT Filled Rows Filled % Unique Count
TPLNR 130128 100.00 99352
TPLKZ 130128 100.00 11
FLTYP 130128 100.00 1
ERDAT 130128 100.00 2252
IWERK 130128 100.00 6
ILOAN 130128 100.00 99352
OBJNR 130128 100.00 99352
STRNO 130128 100.00 130128
TPLMA 129178 99.27 4914
RBNR 88803 68.24 141
HERST 30816 23.68 1919
TYPBZ 26616 20.45 6499
SERGE 8955 6.88 5859
SUBMT 1149 0.88 584
MAPAR 511 0.39 241
EMATN 0 0.00 1

keep in mind we had trimmed the values and not filtered out empty values from the count
how did you cross checked?

i had this report as a sample from my seniors, and the input data was same

if we filter out the empty values will that make it right?

you would know it. Have a look on this LINQ:

(From c In varInputSapDataTable.Columns.Cast(Of DataColumn)
Let cn = c.ColumnName
Let drFilled = varInputSapDataTable.AsEnumerable.Where(Function (x) Not (isNothing(x(cn)) OrElse String.IsNullorEmpty(x(cn).toString.Trim)))
Let cntFilled = drFilled.Count
Let p = (cntFilled/varInputSapDataTable.Rows.Count*100).toString(“F2”)
Let cntDistinct= drFilled.Select(Function (r) r(cn).toString.Trim).Distinct().Count
Order By CDbl(p) Descending
Let ra = New Object(){cn, cntFilled, p, cntDistinct}
Select row=dtResult.Rows.Add(ra)).CopyToDataTable

Still not correct i checked manually this time
the unique values of one column are 1862
but it is showing as 1918

you can analyse it within immediate panel ad hoc.
you can check the LINQ part with a reduced set of data

we cannot rate if the numbers has to be matched or not

just take your time

actually I am not able to understand the linq part

can we write a linq
where we have a temp data table
and then we remove duplicates for a particular column and then count the remaining rows and that would be the unique values