Compare two CSV or XLSX files to find leftover rows

Hello,

I have two csv or xlsx files, doesn’t matter which… I need to compare both datatables to each other, and when a match is found, remove that row from both datatables and do this until there are no more matches. I would then want a third datatable created that would contain the leftover rows. I am facing an issue where my Dispatcher is loading let’s say, 1600 items, but Performer is processing 1601 for some reason. I need to be able to identify which one is the “extra” ghost transaction on the Performer output sheet. So I am essentially comparing the Dispatcher source sheet with the output Performer sheet and outputting the row on the Performer sheet that doesn’t have any rows to compare to on the Dispatcher sheet.

Thank you!

hi @Josh_James

You can read both the Files, one using Read CSV one using Read Range.
Output them in Data Table 1 and DataTable 2, and then use the following:

OutputDT = DataTable1.AsEnumerable().Where(function(row) Not DataTable2.AsEnumerable().Select(function (r) r.Field(Of String)("ColumnName")).Any(function(x) x = row.Field(Of String)("ColumnName"))).CopyToDataTable()

Hi @adiijaiin , thank you for the reply!

What would I need to change if dt1 and dt2 have different column names? The column name for dt1 is called “customer_reference” and the column name from dt2 is “Account Number”. Both columns have the values I want to compare to eliminate the matches and return the one that doesn’t have anything to match with. Do the column names have to be the same for dt1 and dt2?

Hi @Josh_James

Assuming that atleast they have same values. Column Names can be different. You need to change column name respectively
For example: if DT1 has Customer_reference, and DT2 has Account Number
the query would be like:

OutputDT = DataTable1.AsEnumerable().Where(function(row) Not DataTable2.AsEnumerable().Select(function (r) r.Field(Of String)(“Account Number”)).Any(function(x) x = row.Field(Of String)(“Customer_Reference”))).CopyToDataTable()

When I do that, I get this error: Column ‘customer_reference’ does not belong to table DataTable.

It does belong to dt1 though…

image

If I swap the column names in the expression, I get this error instead: The source contains no DataRows.

If you swap then can you also use Dt2 first and dt1 later

Still getting this error: Assign dtResult: The source contains no DataRows.

Here is a screenshot of my expression. "dtEODSheet is the dt with “Account Number” as the column header and dtSourceCSV is the dt with “customer_reference” as the column header. Both of those columns contain the data I want to compare.

@Josh_James You don’t have to do lots of activity do below activities

1 Read the CSV file into a data table
2 Create the XLSX File and Write the data table

or then second option I am referring passing the CSV file into the database and performing logic in DB level which is faster than local excel automation.

Hi @mukesh.singh ,

Sorry, I don’t think I follow. What I’m trying to do is compare one column from each data table, remove the matches, and only return the row(s) that are left over.

Can you give some Dummy data in input files to us???

Give us sample file so we can check and provide you update

Hey @Josh_James

Try Below LinQ, You will get your expected Output!

dt_Output = dtEODSheet.AsEnumerable.ExceptBy(dtSourceCSV.AsEnumerable().Select(Function(x) x("customer_reference").ToString.Trim),Function(x) x("Account Number").ToString).CopyToDataTable

Note: dt_Output’s Datatype is Datatable. And also you can switch datatable variables and column name respectively as per your requirement.

Screenshot for your reference:

If No Rows found after above LinQ, it will throw error as: No Datarow found

For Exception Handling:
Surround Assign Activity with Try Catch, in Catch take assign and clone the datatable that you need as the output.
It will give output as Empty Datatable cause no Rows found!
In below example I have taken dtEODSheet,

image

Regards,
Ajay Mishra

@Ajay_Mishra @adiijaiin @mukesh.singh

Please see example data. dt1 has 6 rows, dt2 has 7 rows. The one that won’t have a match when compared will be from dt2’s column “Account Number”. “Reference02” should return as the leftover item, but dtResult doesn’t return anything. I appreciate the help!

FindLeftoverRow.zip (3.7 KB)

Hey @Josh_James

I checked your Input files, Above query will give you the rows which are not present in the table 1 but as per your input files all the values are present in the 2nd table that’s why No extra rows are found!

Just changes the values of 1st table you will get the output!

Regards,
Ajay Mishra

Hi @Ajay_Mishra - I cannot change the values of either data table. Those need to be left alone. I just need to get the leftover row, even if the data is present. So if “Reference02” appears twice in dt1, but 3 times in dt2, how do I get dtResult to say “Reference02”? The reason I need this is because the files I’ll be comparing have thousands of rows each, but sometimes Performer will process a “ghost” item out of nowhere, so I need to be able to find the extra item on the Performer report, which is dt2.

@Josh_James Got it!, Now I’m very much clear about your scenario.

Working for your solution!

Regards,
Ajay Mishra

1 Like

Hey @Josh_James

Here is the Solution!

Use Below mentioned LinQ:

dt_Result = dt2.AsEnumerable() _
    .GroupBy(Function(row) row.Field(Of String)("Account Number")) _
    .Where(Function(grp) grp.Count() > dt1.AsEnumerable().Count(Function(row) row.Field(Of String)("customer_reference") = grp.Key)) _
    .Select(Function(grp) grp.First()).copytodatatable

Note: Above LinQ will give you rows which are not there in Dt1 and repeated more times then Dt1,
Values(Rows) will be only One time not as many times as it is present in Dt2, you can use new datatable rows for filtering and build further logic.

Immediate Panel:

image

Attaching .zip for your reference:
(You can check Output.xlsx File)
FindLeftoverRow_Final.zip (9.7 KB)

Hope this helps you, If yes then kindly mark the post as Solution!

Thanks & Regards,
Ajay Mishra

1 Like

Hi @Ajay_Mishra - Thank you so much!! This is awesome, you are a life saver!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.