Summing duplicate rows with conditional statement

Sample_data.xlsx (15.6 KB)

Basically what I’m after is the equivalent of this:

=SUMIFS($C$2:$C$17,$C$2:$C$17,">0",$B$2:$B$17,"<>Text B",$A$2:$A$17,A2)

That is, sum all amounts belonging to the same ID, given that they are postive numbers and don’t have the “Text B” text in column B.

I can do this in the excel sheet I am importing from, but it’s very slow on large number of rows, so I’m looking for a better way of doing it. Can this be done via an assign operation and some sort of magic?

1 Like

Usually SUMIFs are time taking. That’s why it is slow for large number of rows.

Without formulas, you can directly do it in the UIPath using LINQ queries or data table operations.

You can apply filters on the columns to the data table and calculate the sum based on LINQ query which is faster than calculating formulas on Excel.

Regards,
Karthik Byggari

1 Like

Thanks. Any pointers on how to use LINQ for this? I’ve never done that before.

@mickeymack
Scenario has a Chance to do it with linq or in Combination with plain activities. I can have a Look on in later.

I’ve done some changes to the data.table so I’ve removed the text column (filter on it rather) and use this to sum duplicates:

(From p In db_in.Select
Where p(“ID”).ToString.Trim.Equals(ID) And Convert.ToDouble(p(“Amount”).ToString.Trim)>0
Select Convert.ToString(p(“Amount”))).Sum(Function(x) Convert.ToDouble(x)).ToString

This seems to be working at least.

I do feel that WE also need a group by. Give some little time

@mickeymack
I did a split on two assigns:

First: Filtering
(From p In dtDemo.AsEnumerable
Where Not p(1).ToString.Trim.Contains(“Text B”) And Convert.ToDouble(p(2).ToString.Trim)>0).CopyToDataTable

Second: Summing Group Sums
(From row In dtFiltered.AsEnumerable()
Group row By ID=row(0) Into Group
Select New Object() {ID, Group.Sum(Function(x) Convert.ToDouble(x(2).ToString.Trim))}).ToList

Returning a list of Object()

I had some issues while writing the script. Once I am back I from shopping I can have a check if both statements can be combined

1 Like

@mickeymack
Here is the combined Statement returning a list of Object()
(From p In dtDemo.AsEnumerable.Where(Function ® Not r(1).ToString.Trim.Contains(“Text B”) And Convert.ToDouble(r(2).ToString.Trim)>0)
Group p By ID=p(0) Into Group
Select New Object() { ID, Group.Sum(Function(x) Convert.ToDouble(x(2).ToString.Trim)) }).ToList

Just adopt the column references from index to name if you prefer

Thanks again - so it will work if I put that into the right hand side of an assign activity and then assign it back to a DT?

@mickeymack
PFA XAML: mickeymack.xaml (11.6 KB)
I just completed and integrated also the handling of a result datatable

That’s aweome @ppr, but I’m having trouble integrating it into my code. The data in SampelData is just part of a bigger table, so I’d rather not build a new table, but assign the results of the summing directly to the existing datatable. And I’ve found a solution to the filtering of the text, so I don’t need that in the code (data is filtered before it’s read into the robot).

@mickeymack
The result of the group by is Object() so you cannot use for a CopyToDataTable. But the question is

  • on how want to assign ist
  • where (which Column, which row) you want to assign
    • as you have e.g. many A rows but you only have a single A sum

However we will find a solution, Just give me more inputs on scenario eg with sample data / structure infos. Thanks

I’m sorry for not explaining it better. I don’t want them grouped. The reason is that I want to keep each row, but sum the amounts belonging to each ID. So we can add a fourth column, say TotalSum, and put the total sum there on each row. Sample_data.xlsx (16.1 KB)

I’ve added a new excelfile; you can see from Column C what I want to replicate inside the XAML.

As a first shot maybe pre calculating the sums for each Id, retrurning a list, iterating over the datatable and setting the TotalSum info could be an approach. But can you please name what is bigger table about. 1000 Rows, 10000, 100K, 10 Millions?

Ok i will think about it. I do feel I have an idea

You are a saint @ppr. The size of the table varies, it’s always 17 columns, but can vary from 250 to 30.000 rows.

30.000 should not be a Blocker. However I will play. Just to clear.
Task is about
Calculating the sum for an ID
Updating each row column TotalSum with the sum info
And best do in one go, right?

Spot on as always!

Ok first half round
Creating a lookup Dicitionary (Of String, Double)
(From id In dtDemo.AsEnumerable.Select(Function (row) row(“ID”).ToString.Trim).Distinct().ToList
Let sum = dtDemo.AsEnumerable.Where(Function (k) k(“ID”).ToString.Trim.Equals(id)).Sum(Function (s) Convert.ToDouble(s(“Amount”).ToString.Trim))
Select New KeyValuePair(Of String, Double) (id, sum)).ToDictionary(Of String, Double)(Function (k) k.Key, Function (v) v.Value)

this can be later used to set the TotalSum within a for each row or with the oneliner invoke code approach from my xaml

Now I do look If Ido find a more integrated way (But its hard, as Column updating has to respect some constraints within LINQ Functionality)

@mickeymack
I would suggest to go with 2 Split approach:
first: is creating Lookup Dict
second: is updating the datatable

PFA mickeymack2.xaml (8.1 KB)

Updating DataColumns within LINQ Statements is risking too much JUNK code as its stressing concept and syntax. So I decided to stop on it

In case of really critical perfomance issue due of mass data (masses for me are some brighter counts having much zeros in the bright) then lets see on how we can tune the update for each row.

Can you check at your end the XAML and let me know the feedback. Thanks