Comparing 2 excel files and outputing to a 3rd one based on missing rows and/or data cells

CONTEXT:

I have 2 excel files - one with extracted data from PDF invoices (EXC1) and another one which is a CSV downloaded from a invoice manager (EXC2). Both of these have 2 corresponding columns to identify each row/invoice, so the key of each row is like a compound key made of both of these columns (“Comerciante” & “Nº Fatura”).

I think I have the logic of the sequence, I just don’t know what activities are best suited for this particular use case:

(1st IF) I want to read the data from EXC2 and check if the same row exists on EXC1:

  • if it doesn’t, I want to add that row to a new EXC3
  • if it does, I need to read each cell of that row in EXC1 and check if there are any empty/null cells (2nd IF)

(2nd IF) When iterating through the row cells:

  • if the cell is empty or null, I want to fill it with the data of the same column of EXC2
  • if it’s not empty, I need to check if the data of EXC1 cell is equal to EXC2 cell (3rd IF)

(3rd IF) Checking if cell data is equal:

  • if it is equal, do nothing
  • if it is different, update/replace with cell data of EXC2

After iterating all the cells in If (1), I want to add that EXC1 row to EXC3 aswell.

I may need to color code from which IF the rows were added, pointing out that for example row1 was added from EXC1 and row2 from EXC2, etc… Also may need to do the same to the updated cells on the 3rd IF.

EXC1 has a set of columns and EXC2 has a different set of columns but most of them match.

What I pretend is this type of JOIN as if Table A was EXC1 and Table B was EXC2, the intersection is basically the matching columns:
image

Here’s what I have so far:

And here’s some files with dummy data if needed:

  • EXC1.xlsx (9.7 KB)
  • EXC2.csv (10.0 KB) (Keep in mind this file is supposed to be a .csv, the format is not supported when uploading to this thread so I put it as a .xlsx)

Thanks in advance, still trying to learn Studio’s activites.

@lcvalves
Welcome to the Forum.

As a First step have a Look on the Join Datatable Activity. Fron the Join result the different Outputs can be sliced, derived.

For more Custom needs LINQ helps to setting Up a more individually solution

Thanks for the welcome!

Any specific LINQ tutorials that you can provide? Can’t find it in the UiPath’s official documentation. JOIN Datatable doesn’t work for this use case, but what I’m trying to recreate is certainly something alongside that idea.

Also tried Look-up datatable but it only works for 1 column only and I need to identify the row by 2 columns.

@lcvalves
find some example on inner join with LINQ just for introduction purpose:
Find_Common_NonCommon_By1Col.xaml (12.3 KB)
about Left join etc. we also will support you if it is needed in your case.

About your requirements:

  • 2nd if: if the cell is empty or null, I want to fill it with the data of the same column of EXC2
  • 3rd if: if it is different, update/replace with cell data of EXC2

maybe I did misunderstand but it looks after simplifying: if Exec1 row differs from Exec2 row overwrite Exec1 cell values with these fromm Exec2. Otherwise it is unclear why:

  • checking the blanks, checking the values and the final result will always be the Exec2 values

Can you help us on this requirement and elaborate more in detail.

Thanks

I eventually got it to work the way that I want to, here’s the xaml file if you want to check it out:

  • EXC3_Comparar_e_corrigir.xaml (34.6 KB) (It’s very likely that there’s an easier way to do this but that’s how could I figure it out, feel free to leave feedback if you want)

Now I just need to deal with some formatting issues, because some values of EXC1 are the same as EXC2 when dealing with same row record (same “Comerciante” & “Nº Fatura”) but just in different formats like dates or currency. For example, on EXC1 I have dates in yyyy-MM-dd format, but on EXC2 they’re in dd/MM/yyyy format. Same thing with currency, EXC1 is ‘xxx.xx’ and EXC2 is ‘xxx,xx €’.

The process that I have still counts those values as different and overwrites the data table row items with the EXC2 data while it shouldn’t, because they’re the same.

Should I deal with this when creating the EXC1 file, due to it being made by different types of invoices or just do a pre-process before comparing it to EXC2?

Thanks btw!

May we ask you to answer the question.

That’s because there are columns in EXC1 that are different from EXC2 that I need on the final EXC3 and I don’t want to overwrite/delete those values.

not 100% clear to me / understood.
when bringing the unmatching Exc1 into an Exc3 report and overwriting consequently unmatching Exc1 values with Exc2 values then targeted result should be achieved, am i right?

Basically the EXC2 contains the correct information about all the registered invoices online, the EXC1 is just extracted data from some invoices that are on a local folder but that are also registered online. The thing is that EXC1 has extra data that is not present on EXC2.

Technically I could just read through the EXC2 rows and complement any extra data that the EXC1 may have and create the EXC3 from that, but the problem is that this entire process is for a university project and I wasn’t the one who made the guidelines for it, so I had to find my way around to do it the teachers wanted me to.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.