Here is the case in my sheet:
Column P (“Staff No”)
Column J (“Amount”)
I want to check whether there are duplicate “Staff No” and if there is, I want to add the “Amount” together and write the Sum of “Amount” to a new Column.
Afterwards, I would like to count the number of repeated Staff No. If the number exists in the columns for 3 times, then perhaps count it as “3” in another new Column. Eventually, deleting the other 2 rows with duplicate “Staff No”.
I have seen many similar cases in the forum but most of them are working under For Each. Is there a way other than For Each to conduct the process faster as my database is kinda large?
We can check if grouping the data will serve. Have a look here:
when grouping on Staff No we can identify the duplicates, when checking the group count. (You will also find a lot of examples for this on the forum via search)
Just a question. In this case, will the columns that are blank always have the same values? E.g.: for “Staff No” A12345, we will have the same value in the blank columns for both rows, right?
In this case, I think the blank row items are not necessarily identical even they have the same Staff No. For example, they might be responsible by different persons with the same findings/reasons. Is that affect your coding to a large extent?
Because in this case, when we group with the sum and count, we’re going to have AT least the same line number or less than the original. In the case of grouping, what would we do with the information in the other columns when we merge the rows? I don’t know if it was clear. In this case it would not be possible to add two columns to this Input DataTable because the rows would not be the same. Let me know if it’s not clear.
Does it mean that we could not only rely on 1 identical row item like A12345 in the Column Staff No while other Column row items do not remain constant, in the case of grouping?
For the Amount Column, I think the numbers should be in 2 decimal places instead of integers.
While for the Expected Output, I am still consolidating my output design. May I return to you later after taking a sleep since it is 4 am at my time now?
And would you mind sharing your current file in advance?
I created a variable of type DT “dt_result” to store the final datatable. I made a clone of the table “dt_input” so that we get the same column schema.
I then performed the query of Group By. I couldn’t do it in a very elegant way, maybe there is a way that @ppr knows, if he can help us. But the criterion I used to populate the other columns we mentioned was to always get the first value.
we can reduce redundancies / repetitions by row array construction options
col1 is the same value as Group.First()(15)
so we can work with take() and concat()
(From d In dt_input.AsEnumerable()
Group d By k=p("Staff No").ToString Into grp = Group
Let ra1 = grp.First().ItemArray.Take(22)
Let s = Sum(Function(x) Convert.ToDecimal(x("Amount").ToString.Replace(",","."))
Let ra = ra1.Concat({s, grp.Count}).toArray
Select r = dt_result.Rows.Add(ra)).CopyToDatatable
(untested this code lines, but practiced very often)
sometimes we do need to harmonize it like
Let ra = ra1.Concat({s, grp.Count}).Cast(Of Object).toArray
We would not recommend fixing number formats on string base with replaces. A more reliable approach is using the locals details custom formats for the parsing
After Join Datatable, I want to delete the duplicate rows now if both Column “Staff No” and “From” . I have tried using “Remove Duplicate Rows” but it does not work.
Would anyone of you suggest an efficient method to do it? Thank You.
Or the only way is to repeat the upper process again?
not sure about the case, but looks like a new topic. Then We would prefer to have it as a new topic here in the forum as well.
When it can be done quickly, give a try on remove all duplicates identified by 2 cols:
dtDecisionDeDups =
(From d in dtDecision.AsEnumerable
Group d by k1=d(“Staff No”).toString.Trim, k2=d(“From” ).toString.Trim into grp=Group
Where grp.Count = 1
Select g=grp.First()).CopyToDataTable
give a try on following, taking the first member from duplcated group
(From d in dtDecision.AsEnumerable
Group d by k1=d("Staff No").toString.Trim, k2=d("From" ).toString.Trim into grp=Group
Select g=grp.First()).CopyToDataTable