I am trying to remove duplicate rows from an input file being processed by my bot.
How can you remove duplicate rows based on specific columns being identical and not all?
For example, I have to filter out duplicate rows in a data file where only criteria such as NAME + POLICY NO + START DATE are identical, to only keep 1 row. Regardless of what’s in the other columns.
For example.
FIRST NAME | SURNAME | POLICY NO | START DATE | ADDRESS | NOTES John | Identical | 12345 | 01/01/2021 | 12 Test St | Dummy text here1 John | Identical | 12345 | 01/01/2021 | 12 Potato St | Dummy text here2 John | Identical | 12345 | 01/01/2021 | 12 Fly St | Dummy text here3 John | Identical | 12345 | 01/01/2021 | 12 Weird St | Dummy text here4
Remove all except 1 row… ???
FIRST NAME | SURNAME | POLICY NO | START DATE | ADDRESS | NOTES
John | Identical | 12345 | 01/01/2021 | 12 Test St | Dummy text here1
Instead of Build Data Table activity use the Read Range Activity to read the input excel file into the data table variable (inputDT)
The LINQ used is given below
(
From row In inputDT
Group row By
k1 = row("FIRST NAME").ToString.Trim,
k2 = row("SURNAME").ToString.Trim,
k3 = row("POLICY NO").ToString.Trim,
k4 = row("START DATE").ToString.Trim
Into grp = Group
Select grp(0)
).CopyToDataTable
Finally you can write the output in an excel file using the Write Range Activity.
Wow you’re a legend! This works flawlessly. Thankyou.
Not familiar with LINQ so this is something I might try to learn more about. I’m sure this is a very common function that others will appreciate your solution.