Compare 2 excel columns and find missing records/dupicates in one and not the other

I have a simple but not so simple task to:

Read 2 Excel files, then compare one col in Excel file A to another col in excel file B. ( One Column has more records compared to the other, hence the need to find the variance)

The task is actually to find the duplicate records in one sheet that makes the count of that col higher than the other, then write those duplicates to another Excel file.

I know there are duplicates in one of the files ( see dups.txt) in the project but automation cannot find them.

I have used two approaches ( Main.xaml uses “join Data table” ) while the ("AsenumarbleApproach.xaml) uses a different logic.

Attached is my workflow, I need help figuring out what I was doing wrong because the two approaches are not identifying the duplicate entries.

Attached is my workflow.
missingRecswhen2ExclColsAreCompared.zip (34.1 KB)

Hi,

Can you share expected output?
It seems there is no missing record in your sample.

For now, can you check/run Sequence.xaml in the following project?

missingRecswhen2ExclColsAreCompared_v2.zip (41.6 KB)

Regards,

@Yoichi
Thank you, below is the expected output.

image

They are the duplicates that caused the record count between the two columns to differ.

The image is a screenshot of an Excel spreadsheet showing a list of category numbers in column A, with some duplicates, and a red text comment in cell D2 indicating a desire to eliminate duplicate entries. (Captioned by AI)

HI,

How about the following?

Sequence.xaml (11.8 KB)

Regards,

@Yoichi
The last solution sequence.xaml needs a few tweak as shown below. for each duplicate, we only want to write out one entry as shown in the screenshot below.
thanks

The image shows an Excel spreadsheet with a column of duplicate numbers and an adjacent column with the expected outcome, where only unique numbers are listed once. (Captioned by AI)

@Yomi_Oluwadara

image

1 Like

Hi,

Can you try to add Distinct() as the following?

image

arr_variance = dict_catNo.ToDictionary(Function(kv) kv.Key,Function(kv) kv.Value-if(dict_catActNo.ContainsKey(kv.key),dict_catActNo(kv.Key),0)).Where(Function(kv) kv.Value>0).Select(Function(kv) kv.Key).Distinct().ToArray()

Regards,

@Yoichi
Thanks, I have a question around the variables arr_variance and dict_catNo

Question1: What are their types?
Question2: Where will we put the assign activity? - Should it replace the arrDr variable?

@sudster
Thanks, lots of unknows -what is the “Build Data Table” used for ?
I’m also assuming the read range is reading the Excel file that has the dups

@Yomi_Oluwadara ,

You assumed that correctly :smiley:

1 Like

Hi,

Can you check the following?

Sequence.xaml (12.2 KB)

Regards,

1 Like

@sudster Thanks for the insight.

@Yoichi This worked, the introduction of the distinct function made the difference.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.