Compare multiple columns in excel

I am trying to compare two excel sheet that have thousands of data. I would like to know how can these be compared quickly without using for each row activity

Sheet 1 has two columns: Id and Source Value.
Sheet 2 has two columns: Id and Target value
I want an output in sheet 3 with the columns Id, Source value, Target value and status
For each Id, it should populate corresponding source value from sheet 1 and target value from sheet 2 and update status as ‘Pass’ if source and target value matches or update as ‘Fail’

Please help

@Palaniyappan
@kadiravan_kalidoss

@nabeelu Can you provide sample Input Files?

DataComparison.xlsx (10.3 KB)
Please find sample sheet

@supermanPunch
Were you able to have a look

@nabeelu Yes :sweat_smile:, Sorry for the Delay, I was a bit busy, Without using a For Each Loop, it is a bit difficult to implement, so working on that.

@supermanPunch
Hope this solution helps…

  1. Read both the excel files
  2. Use for each datarow on excel datatable 1 and inside that use another for each datarow on excel datatable 2
  3. In the inside for each use IF condition to check the ID column matches,
    if matches -
    { use another IF condition to check Source from Datatable1 = target from datatable 2,
    if matches - use add datarow to output datatable and add the values in array**
    else - Continue**
    }

else - Continue

@Mihir_Mapara
Thanks for your solution. But my excel data can have 50000 row or more. Using for each row will take lot of time to process. Apart from for each row, is there any other option probably using code

@nabeelu
we can give a chance to do it with LINQ. I will mark your request and will have a look on it during my lunch break

1 Like

@nabeelu
Was done within a Quick and dirty aproach:
Find starter Help here:
nabeelu.xaml (10.6 KB)

Also have a look to this activtiy with using left join

1 Like

Thanks @ppr, for your effort and time
I was looking for below kind of result

1,a,a,Pass
2,b,k,Not Present
3,c,c,Pass
4,d,m,Not Present
5,e,t,Not Present
6,f,Not Present
5,o,o,Pass
5,u,NotPresent
7,a,a,Pass

For id 5, in source I have values o,e and u
In target I have value o and t
So I need one row to have 5,o,o,Pass (as given in your result)
Other two rows I need value of t for source (5,e,t,NotPresent) for one row and blank (5,u,NotPresent) in other row

Yes this data is correct

For Id 5- Value in source is e, o, u
Value in target is o and t

As per your result for all source of 5 (e, o and u) target value matched is o
But my scenario is it should be one to one mapping
i.e Result 1: (5,o,o,PAss) value of target and source is o and o

Result 2: Second result it should not consider value o again for target. It should consider only target value t and blank
It should be
5, e,t,Not Present or 5, e,NotPresent (i.e for target it should consider only t or blank as o was already considered in previous result (Result 1))

Result 3: 3rd result of 5 should be - 5,u,NotPresent (if target value of t was considered in prev result (Result 2)) or 5,u,t,NotPResent (if target value blank was considered in prev result (Result 2))

@nabeelu
I didnt get you 100% but i found a small bug within the mapping dictionary. Now I do have an output more close to your excel output sample:

grafik

with fixed dict

In case the result differs please help on getting the 100% understanding and I will help you

1 Like

Issue is only with id 5

Input 1
Id Source Value
5 e
5 o
5 u
Input 2
Id Target value
5 o
5 t

I am expecting below output for id 5

Id Source Value Target Value Status
5 o o Pass
5 e t Fail
5 u Fail

Difference is that in your result target values for all sets of id 5 are o. But here it is not
Please let me know if you need more info

can you send the excel with that you and me are working with the same data. I will have a look on it after my job work. But please let the mapping dict excact same as the corrected one so we can identify the issue faster

1 Like

DataComparison Use case.xlsx (11.2 KB)

Please find the excel sheet attached. For easy reference, I have given both input data and output data in same excel sheet.

I have given two expected output. Either of them will be fine

@nabeelu
can you please crosscheck this result
grafik

About the labels Pass, Faill, Not Present we can change later as by your needs. Currently I am more focused on a bug free implementation. Also do not focus on teh changed order (e.g all 5’s together).
Thanks

Perfect, this is the kind of result I am looking for

1 Like

@nabeelu
I will Crosscheck my Implementation and then pass IT to you. I need some little more time thanks

1 Like

@nabeelu

About the Scenario and Approach:
Sheet 1 = D1
Sheet 2 = D2
Definition of a Group: D1.ID
Definition of a Match: In a Group D1.Value = D2.Value

  • Group D1 Data
  • Process every Group one by one - D1Stack
  • Retrieve from D2 all items having the same Group Key - D2 Stack
  • Find the matches between D1Stack And D2 Stack - Matches
  • Remove Matches D1Matches from D1Stack, Similar doing it with D2
  • take from D2Stack items of the count of remaining elements in D1Stack or less if D2Stack.Count < D1Stack Count
  • Build the not matching pairs: d1(0)d1(1)d2(1)“Fail” - Unmatches
  • Remove from D1Stack the d1 Unmatches
  • Build the not having any d2 items d1(0),d1(1),“”,“Not Present” - Unbounds

About the Implementation:
50000 rows are a hint on Performance and more important on reliable and testable implementation. Doing it all in a Single LINQ will not adress these quality factors. So it was build by a bottom to up Approach. Does mean: within the different stages we have the chance to test the part results.

Example Log:
grafik

Implementation has the risks of fails when:

  • Duplicates will occur
  • Is not handling the case D2Stack.Count > D1Stack (is doing now a warning log)

Find starter help here
nabeelu_V2.xaml (19.7 KB)

1 Like