Check, Count and Delete duplicate items with summation

Hi Everyone it’s me again!!!

Here is the case in my sheet:
Column P (“Staff No”)
Column J (“Amount”)

I want to check whether there are duplicate “Staff No” and if there is, I want to add the “Amount” together and write the Sum of “Amount” to a new Column.

Afterwards, I would like to count the number of repeated Staff No. If the number exists in the columns for 3 times, then perhaps count it as “3” in another new Column. Eventually, deleting the other 2 rows with duplicate “Staff No”.

I have seen many similar cases in the forum but most of them are working under For Each. Is there a way other than For Each to conduct the process faster as my database is kinda large?

Many thanks if anyone can help, please.

We can check if grouping the data will serve. Have a look here:

when grouping on Staff No we can identify the duplicates, when checking the group count. (You will also find a lot of examples for this on the forum via search)

the aggregations e.g. Summing up we can do with the LINQ Sum Operator:
[HowTo] LINQ (VB.Net) Learning Catalogue - Help / Something Else - UiPath Community Forum

feel free to share some sample data with us and we will help you based on this

Hi @ppr

Book1.xlsx (9.1 KB)

Attached is the sample data with 3 row items.

Would you help me based on this file?

Just feel free to add the 2 new Columns “Sum of Amount” and “No of Duplicated Line Items” at the back.

Really Thanks.

Hey @kit2815!! Good to see you again!! Let’s build a solution for this case too!
Doing it right now!

Just a question. In this case, will the columns that are blank always have the same values? E.g.: for “Staff No” A12345, we will have the same value in the blank columns for both rows, right?

image

Hi @gabrielribas4

Feels good to see you again Gabriel.

In this case, I think the blank row items are not necessarily identical even they have the same Staff No. For example, they might be responsible by different persons with the same findings/reasons. Is that affect your coding to a large extent?

1 Like

Because in this case, when we group with the sum and count, we’re going to have AT least the same line number or less than the original. In the case of grouping, what would we do with the information in the other columns when we merge the rows? I don’t know if it was clear. In this case it would not be possible to add two columns to this Input DataTable because the rows would not be the same. Let me know if it’s not clear.

Sorry, perhaps I am a bit confused.

Does it mean that we could not only rely on 1 identical row item like A12345 in the Column Staff No while other Column row items do not remain constant, in the case of grouping?

We can! But we will loose the other informations! Look at this example:

Output: We lost the “Type”=3 information. I’m not really sure about the Output you expect, can you share an example in the style I did above?

Thanks!!

Thanks, Gabriel.

For the Amount Column, I think the numbers should be in 2 decimal places instead of integers.

While for the Expected Output, I am still consolidating my output design. May I return to you later after taking a sleep since it is 4 am at my time now?

And would you mind sharing your current file in advance?

Many many thanks indeed Gabriel.

Hey @kit2815 ! Hope you have a good rest!! If you decide on a different Output design, we will change it!!

The solution I drew doesn’t have much mystery, just in the GROUP BY query. Let’s analyze:

image

  • I then performed the query of Group By. I couldn’t do it in a very elegant way, maybe there is a way that @ppr knows, if he can help us. But the criterion I used to populate the other columns we mentioned was to always get the first value.
(From p In dt_input.AsEnumerable()
Group By
col1=p("Staff No").ToString
Into Group
Select dt_result.Rows.Add({
	Group.First()(0),
	Group.First()(1),
	Group.First()(2),
	Group.First()(3),
	Group.First()(4),
	Group.First()(5),
	Group.First()(6),
	Group.First()(7),
	Group.First()(8),
	Group.First()(9),
	Group.First()(10),
	Group.First()(11),
	Group.First()(12),
	Group.First()(13),
	Group.First()(14),
	col1,
	Group.First()(16),
	Group.First()(17),
	Group.First()(18),
	Group.First()(19),
	Group.First()(20),
	Group.First()(21),
	Group.Sum(
	Function(r) Convert.ToDecimal(r("Amount").ToString.Replace(",","."))
	),
	Group.Count()
	})).CopyToDatatable

Hope it helps!!

The .xaml.
Main.xaml (11.4 KB)

The .xlsx
Book1.xlsx (10.2 KB)

1 Like

Hi @gabrielribas4

Many thanks for your detailed presentation.

I would get back to you today if there are any changes.

Let’s keep in touch. Thanks.

1 Like

we can reduce redundancies / repetitions by row array construction options

  • col1 is the same value as Group.First()(15)
    so we can work with take() and concat()
(From d In dt_input.AsEnumerable()
Group d  By k=p("Staff No").ToString Into grp = Group
Let ra1 = grp.First().ItemArray.Take(22)
Let s = Sum(Function(x) Convert.ToDecimal(x("Amount").ToString.Replace(",","."))
Let ra = ra1.Concat({s, grp.Count}).toArray
Select r = dt_result.Rows.Add(ra)).CopyToDatatable

(untested this code lines, but practiced very often)

sometimes we do need to harmonize it like
Let ra = ra1.Concat({s, grp.Count}).Cast(Of Object).toArray

We would not recommend fixing number formats on string base with replaces. A more reliable approach is using the locals details custom formats for the parsing

Thanks both @gabrielribas4 @ppr !!!

You guys solution does worked. Really appreciated.

But here comes to the last problem for my final output decision table. Attached is the some abstract Columns of the table.

Decision Table.xlsx (9.8 KB)

After Join Datatable, I want to delete the duplicate rows now if both Column “Staff No” and “From” . I have tried using “Remove Duplicate Rows” but it does not work.

Would anyone of you suggest an efficient method to do it? Thank You.

Or the only way is to repeat the upper process again?

not sure about the case, but looks like a new topic. Then We would prefer to have it as a new topic here in the forum as well.

When it can be done quickly, give a try on remove all duplicates identified by 2 cols:

dtDecisionDeDups =

(From d in dtDecision.AsEnumerable
Group d by k1=d(“Staff No”).toString.Trim, k2=d(“From” ).toString.Trim into grp=Group
Where grp.Count = 1
Select g=grp.First()).CopyToDataTable

Sorry as I think it is still related to delete duplicate items, so I did not open a topic. I would open a new one for next time.

While for the formula, it results deleting both duplicate rows.

How should the formula be changed to keep at least 1 row left? Thanks :man_bowing:

didn’t get you as it was mentioned:

the linq is doing this
keep red ones
filter out the yellow ones and also keep all others

Was your requirement misinterpretated?

Real sorry maybe I misinterpret my requirement.

What should I do now to keep at least 1 row left for yellow condition?

Like even there are more than 2 yellow rows, just keep 1 only. Thanks

give a try on following, taking the first member from duplcated group

(From d in dtDecision.AsEnumerable
Group d by k1=d("Staff No").toString.Trim, k2=d("From" ).toString.Trim into grp=Group
Select g=grp.First()).CopyToDataTable
1 Like

Many thanks indeed. That’s what I need.