Check, Count and Delete duplicate items with summation

kit2815 · February 15, 2022, 5:23pm

Hi Everyone it’s me again!!!

Here is the case in my sheet:
Column P (“Staff No”)
Column J (“Amount”)

I want to check whether there are duplicate “Staff No” and if there is, I want to add the “Amount” together and write the Sum of “Amount” to a new Column.

Afterwards, I would like to count the number of repeated Staff No. If the number exists in the columns for 3 times, then perhaps count it as “3” in another new Column. Eventually, deleting the other 2 rows with duplicate “Staff No”.

I have seen many similar cases in the forum but most of them are working under For Each. Is there a way other than For Each to conduct the process faster as my database is kinda large?

Many thanks if anyone can help, please.

ppr · February 15, 2022, 5:30pm

We can check if grouping the data will serve. Have a look here:

when grouping on Staff No we can identify the duplicates, when checking the group count. (You will also find a lot of examples for this on the forum via search)

the aggregations e.g. Summing up we can do with the LINQ Sum Operator:
[HowTo] LINQ (VB.Net) Learning Catalogue - Help / Something Else - UiPath Community Forum

feel free to share some sample data with us and we will help you based on this

kit2815 · February 15, 2022, 5:44pm

Hi @ppr

Book1.xlsx (9.1 KB)

Attached is the sample data with 3 row items.

Would you help me based on this file?

Just feel free to add the 2 new Columns “Sum of Amount” and “No of Duplicated Line Items” at the back.

Really Thanks.

gabrielribas4 · February 15, 2022, 7:20pm

Hey @kit2815!! Good to see you again!! Let’s build a solution for this case too!
Doing it right now!

gabrielribas4 · February 15, 2022, 7:22pm

Just a question. In this case, will the columns that are blank always have the same values? E.g.: for “Staff No” A12345, we will have the same value in the blank columns for both rows, right?

kit2815 · February 15, 2022, 7:27pm

Hi @gabrielribas4

Feels good to see you again Gabriel.

In this case, I think the blank row items are not necessarily identical even they have the same Staff No. For example, they might be responsible by different persons with the same findings/reasons. Is that affect your coding to a large extent?

gabrielribas4 · February 15, 2022, 7:31pm

Because in this case, when we group with the sum and count, we’re going to have AT least the same line number or less than the original. In the case of grouping, what would we do with the information in the other columns when we merge the rows? I don’t know if it was clear. In this case it would not be possible to add two columns to this Input DataTable because the rows would not be the same. Let me know if it’s not clear.

kit2815 · February 15, 2022, 7:38pm

Sorry, perhaps I am a bit confused.

Does it mean that we could not only rely on 1 identical row item like A12345 in the Column Staff No while other Column row items do not remain constant, in the case of grouping?

gabrielribas4 · February 15, 2022, 7:41pm

We can! But we will loose the other informations! Look at this example:

Output: We lost the “Type”=3 information. I’m not really sure about the Output you expect, can you share an example in the style I did above?

Thanks!!

kit2815 · February 15, 2022, 7:57pm

Thanks, Gabriel.

For the Amount Column, I think the numbers should be in 2 decimal places instead of integers.

While for the Expected Output, I am still consolidating my output design. May I return to you later after taking a sleep since it is 4 am at my time now?

And would you mind sharing your current file in advance?

Many many thanks indeed Gabriel.

gabrielribas4 · February 15, 2022, 11:36pm

Hey @kit2815 ! Hope you have a good rest!! If you decide on a different Output design, we will change it!!

The solution I drew doesn’t have much mystery, just in the GROUP BY query. Let’s analyze:

Input file sent:

image1920×1016 89.4 KB
Reading the data and storing in “dt_input”

image1920×735 61.3 KB
I created a variable of type DT “dt_result” to store the final datatable. I made a clone of the table “dt_input” so that we get the same column schema.

image1920×745 51.7 KB
I added in the DT “dt_result” two columns “Sum of Amount” and “No of Duplicated Line Items”.

image1920×926 79.7 KB

I then performed the query of Group By. I couldn’t do it in a very elegant way, maybe there is a way that @ppr knows, if he can help us. But the criterion I used to populate the other columns we mentioned was to always get the first value.

image1920×1021 84.9 KB

(From p In dt_input.AsEnumerable()
Group By
col1=p("Staff No").ToString
Into Group
Select dt_result.Rows.Add({
	Group.First()(0),
	Group.First()(1),
	Group.First()(2),
	Group.First()(3),
	Group.First()(4),
	Group.First()(5),
	Group.First()(6),
	Group.First()(7),
	Group.First()(8),
	Group.First()(9),
	Group.First()(10),
	Group.First()(11),
	Group.First()(12),
	Group.First()(13),
	Group.First()(14),
	col1,
	Group.First()(16),
	Group.First()(17),
	Group.First()(18),
	Group.First()(19),
	Group.First()(20),
	Group.First()(21),
	Group.Sum(
	Function(r) Convert.ToDecimal(r("Amount").ToString.Replace(",","."))
	),
	Group.Count()
	})).CopyToDatatable

I wrote the result for verification.

image1920×1017 67.7 KB

image1920×1022 89.3 KB

Hope it helps!!

The .xaml.
Main.xaml (11.4 KB)

The .xlsx
Book1.xlsx (10.2 KB)

kit2815 · February 16, 2022, 3:33am

Hi @gabrielribas4

Many thanks for your detailed presentation.

I would get back to you today if there are any changes.

Let’s keep in touch. Thanks.

ppr · February 16, 2022, 9:56am

we can reduce redundancies / repetitions by row array construction options

col1 is the same value as Group.First()(15)
so we can work with take() and concat()

(From d In dt_input.AsEnumerable()
Group d  By k=p("Staff No").ToString Into grp = Group
Let ra1 = grp.First().ItemArray.Take(22)
Let s = Sum(Function(x) Convert.ToDecimal(x("Amount").ToString.Replace(",","."))
Let ra = ra1.Concat({s, grp.Count}).toArray
Select r = dt_result.Rows.Add(ra)).CopyToDatatable

(untested this code lines, but practiced very often)

sometimes we do need to harmonize it like
Let ra = ra1.Concat({s, grp.Count}).Cast(Of Object).toArray

We would not recommend fixing number formats on string base with replaces. A more reliable approach is using the locals details custom formats for the parsing

kit2815 · February 16, 2022, 4:18pm

Thanks both @gabrielribas4 @ppr !!!

You guys solution does worked. Really appreciated.

But here comes to the last problem for my final output decision table. Attached is the some abstract Columns of the table.

Decision Table.xlsx (9.8 KB)

After Join Datatable, I want to delete the duplicate rows now if both Column “Staff No” and “From” . I have tried using “Remove Duplicate Rows” but it does not work.

Would anyone of you suggest an efficient method to do it? Thank You.

Or the only way is to repeat the upper process again?

ppr · February 16, 2022, 4:25pm

not sure about the case, but looks like a new topic. Then We would prefer to have it as a new topic here in the forum as well.

When it can be done quickly, give a try on remove all duplicates identified by 2 cols:

dtDecisionDeDups =

(From d in dtDecision.AsEnumerable
Group d by k1=d(“Staff No”).toString.Trim, k2=d(“From” ).toString.Trim into grp=Group
Where grp.Count = 1
Select g=grp.First()).CopyToDataTable

kit2815 · February 16, 2022, 4:43pm

Sorry as I think it is still related to delete duplicate items, so I did not open a topic. I would open a new one for next time.

While for the formula, it results deleting both duplicate rows.

How should the formula be changed to keep at least 1 row left? Thanks

ppr · February 16, 2022, 4:46pm

didn’t get you as it was mentioned:

the linq is doing this
keep red ones
filter out the yellow ones and also keep all others

Was your requirement misinterpretated?

kit2815 · February 16, 2022, 4:56pm

Real sorry maybe I misinterpret my requirement.

What should I do now to keep at least 1 row left for yellow condition?

Like even there are more than 2 yellow rows, just keep 1 only. Thanks

ppr · February 16, 2022, 4:58pm

give a try on following, taking the first member from duplcated group

(From d in dtDecision.AsEnumerable
Group d by k1=d("Staff No").toString.Trim, k2=d("From" ).toString.Trim into grp=Group
Select g=grp.First()).CopyToDataTable

kit2815 · February 16, 2022, 5:03pm

Many thanks indeed. That’s what I need.

Topic		Replies	Views
Need a Excel manipulation Solution Activities excel , question	3	328	October 25, 2023
Excel Sum and deletion of duplicates Help	11	3936	June 7, 2021
Find duplicate rows in excel and add particular column value Studio datatable , excel , uiautomation , studio , question , command_palette , activities_panel	19	3739	November 26, 2021
Remove Duplicate rows of having duplicate data in 2 Columns in which 3rd Column status is Completed and Non-Completed using Linq Activities excel , linq	7	321	October 26, 2023
Most efficient way to remove datatable duplicates while summing the dulpicates Activities excel , activities , question	3	1048	June 23, 2022

Most Active Users - Yesterday
ashokkarale
ppr
Anil_G
Ajay_Mishra
Yoichi
mhaniff
Shiva_Nikhil
Anonymouss
quick_123
vrdabberu
More details...

Check, Count and Delete duplicate items with summation

Related Topics