Can you join datatable without duplication?

Hello all!

I’m trying to join two datatables which have the same exact columns, have matching rows, but have unique cell values.

Table 1:
ID | Name | Ticket
1 | John |
2 | Peter | 205
3 | Mary |

Table 2:
ID | Name | Ticket
1 | John | 301
2 | Peter |
3 | Mary | 564

When I attempted an inner join with matching ID, I got:
ID | Name | Ticket | ID_1 | Name_1 | Ticket_1
1 | John | 301 | 1 | John |
2 | Peter | | 2 | Peter | 205
3 | Mary | 564 | 3 | Mary |

When I attempted to merge the datatables, I just got additional rows.
Is there a way to actually join two data tables without duplicating columns, or duplicating rows?

The end result I’m looking for is:
ID | Name | Ticket
1 | John | 301
2 | Peter | 205
3 | Mary | 564

Hi @sidb ,

When the above Task using Join Datatables Activity satisfies the requirement, we use Filtering options/Filter Datatable Activity for removing the Columns that would generate as duplicate keys.

Another approach would be to use Linq Join, but in here we would be preparing the datatable by selecting which columns to be preserved for the Output. A Video Explanation is shown below :

If you do want to raise this as a Feature Request, you could also do this as a Feedback.

How do you use the Filter activity without removing data? It’s okay if I remove Column ID_1 but I can’t remove Ticket_1 without losing data. Plus, I want the ‘Ticket’ data to be on the Ticket column, not on ‘Ticket_1’

@sidb
With one of my work fellows @kumar.varun2 we developed an approach

  • we merged all dt into one
  • grouped on the ID
  • created from each column from the groupmembers a helper tuple
  • filtered out the empty value ones
  • driven by the data column schema we build the itemarray where missing ones were creared with empty vlaues
  • added the merged row to an result datatable

Varun will continue on this approach with details and samples

we were also able to extend this approach to merge multiple datatable and implemented some rules where to pick when more datatables has the value for a particular column.

The motivation about this was:

  • the case is not a merge only usage - as the rows are not merged within one row
  • it is not a side-by-side merge
  • the data joins were not appropriate enough and satisfying
  • we looked for an approach handling multi datatable merges
1 Like

Hi @sidb

Please find the solution with the approach mentioned by @ppr.

Reading the excel sheets and Merging the Data Tables

image

dtResult =

(
	From row In dt.AsEnumerable()
	Group row By k = row("ID").ToString.Trim
	Into grp=Group
	Let lt = grp.Select(Function(g) arrColSet.Select(Function(cn) Tuple.Create(cn, g(cn)))).SelectMany(Function(t) t)
	Let ltf = lt.Where(Function(t) Not (IsNothing(t.Item2) OrElse String.IsNullOrEmpty(t.Item2.ToString.Trim)))
	Let ra= arrColSet.Select(Function(c) ltf.Where(Function(t) c.Equals(t.Item1)).DefaultIfEmpty(New Tuple(Of String, Object)(c, Nothing)).First().Item2).ToArray
	Select r = dtResult.Rows.Add(ra)
).CopyToDataTable

Refer the project

Project.zip (11.2 KB)

1 Like

Thank you! I will merge and then use the filtering to remove duplicates. I wonder why the Join activity exists since it’s kind of pointless?

Thank you again for taking the time to explain!

Oh no, is not pointless it has a particular but reasonable logic