Could you help me insert a new column in an excel file, enhance it and eliminate duplicates from the table?Thanks

Hi everyone,
I need a suggestion.
I’m working with an excel file and I should make an insertion of a new column. The new column should enhance it with the concatenation of three other fields read by the rows of the excel file.

this is the initial table below
data code1 code 2
18/05/2020 11 A3
18/05/2020 81 b6
18/05/2020 55 b6
18/05/2020 81 b6

should become as below

data code 1 code 2 concatenation
18/05/2020 11 A3 18/05/202011A3
18/05/2020 81 b6 18/05/202081b6
18/05/2020 55 b6 18/05/202055b6
18/05/2020 81 b6 18/05/202081b6

Finally from the table I should delete all the rows that have duplicates, but not only the double rows but also the original ones.
To be clearer, I show you below how the table should become.

data code 1 code 2 concatenation
18/05/2020 11 A3 18/05/202011A3
18/05/2020 55 b6 18/05/202055b6

how can i delete both the duplicate line and the control line?
Is there a way to keep the right rows in one table and save duplicate rows in another table?
I hope you can help me and if you have any examples from which I can can take ideas please attach it
thanks

Hi.
Did you try using ‘join data tables’?

@supermanPunch Can u please help with the last part i.e eliminate all the duplicate records

1 Like

@vittoria Can you check if this is what you needed ? Check the Output Sheet in Excel File, Delete the Sheet and Run the Workflow. It Should generate the same sheet again. But this only works if you have only 3 Columns as Input not dynamic.
RemoveDuplicates.zip (8.4 KB)

2 Likes

Thanks for your answer, but in reality I thought of something more dynamic also because in the description I put three columns as an example but in reality in my excel file I have almost twenty

so you are looking on a solution that aggregates all columns, right?
And duplicates are defined with the uniquness off all columns as well. Does mean for uniqness check all columns are used and compaired?

No I don’t have to aggregate all the columns.
I have about twenty columns in the excel file.
I have to create a new column in the excel sheet and value it with the concatenation of three values taken from other columns.
after adding this new column I have to check the duplicates

sorry for my unclear words as concatenation was seen as a string aggregation

I repeat your requirements for clearification:

  • 20 columns
  • 3 columns are to concatenate (a dynamic configurable approach would be nice) and will form an additional column
  • Duplication detection based on result of concatenation
  • retrive:
    • a - the duplicates
    • b - the non duplicates

Correct understanding?

yes,thanks

I will have a look on it, just geive me some little time

2 Likes

@vittoria
I have for time reasons implemented with a smaller set of sample data, but it can be extended easy to more columns

configure in the default values the cols for concatenation and concat colname

data in:
grafik

data Concated with configured columnname:
grafik

Duplicates
grafik

Non Duplicates
grafik

with filter datatable and set to the remove column colname = ConCatCol you can easy remove the concat column

Find starter Help here:
DupNonDup_ByConfColList_ConcatWay.xaml (11.3 KB)

1 Like

Hi @ppr,
I took the example you sent me and I changed a few things, but this select doesn’t work below
(From d In dtConcated.AsEnumerable
Group d By k = d (concat_column) Into grp = Group
Where grp.Count> 1 Select grp.toList) .SelectMany (Function (x) x) .toArray

I am attaching what I did, could you help me understand why the select doesn’t work?
thankscheck_duplicates.zip (10.1 KB)

@vittoria
I will have a look on it later. But in the meanhile ensure that the definition of input expected output is avaialble for me or share it here. Confidential data can reflected with sample data e.A1,B1… The intention of what to achieve has to be clear. Thanks

@vittoria
You removed some important parts and caused a not working result

I merged some parts from your imp and adopted my end to your datastructure. So now its writing out into excel.

Major issues from your imp just for reference:

  • rest in the standard with the names of arguments in_FilePath indicating the direction
  • dont use a copytodatatable directly in cases you cannot guarantee that always rows are returned. No rows returned are causing an exception
  • have a research on what is the difference of datatable clon, copy and using = with copies by reference you would have the risk of side effects

Updated XAML here:
DupNonDup_ByConfColList_ConcatWay_V2.xaml (13.6 KB)

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.