Dispatcher: puts all the duplicate rows from excel to new queue items , even though i have reference numbers for them

Hi everyone ,

I have a dispatcher process which pickups up data from excel one by one and puts it on the orchestrator as different queueitems.

however, my excel file is very big more then 90,000 lines, so there are many different duplicate rows at different places.

i have reference set for each queue item being pushed, but I want the dispatcher to check whether the queueitem it is pushing has a duplicate entry in my excel file ??

if duplicate items are present
then only one queue item should be created for that duplicate rows, not 2 seperate queueitems !!

and if a duplicate is found for that queueitem, i want the dispatcher to write “duplicate” in front of that particular lines

can anyobody suggest how can i get rid of duplicate data before pushing them to queue ??

@samantha_shah

First thing…instead of adding each row…you can use a bulk add queue item and pass whole data at once and select process independently…if unique reference is selected then item with same reference will automatically fail…it gives an ouput datatable with successful and failed items which can be used as status

Cheers

1 Like

Hi @Anil_G Anil, the dispatcher is already made by another developer and now the business wants me to add "duplicate item " in the status column of the same excel where the data is being pulled and pushed to queueitems.

Business donot want us to decrease the lines number in Excel , nor they want me to delte duplicate rows

my code looks like this: Read range activivty for reading excel file
for each activity to iterate through the activity

I want your help in comparing each current row with my excel file and checking if there is any duplicate ?? then “duplicate item should be written in the Status Column” and Only one Queueitme should be created out of those queueitems.

I tried using Linq queries but i am not able to complete it , would you be able to help with some better guidance?

@mukesh.singh/ @Anil_G can you please help?

@samantha_shah pls check the below link

Option 1:

Option 2:
Case 1: you need to put a duplicate check on Excel by pushing this data to the database and you need to get the distinct record from it
Case 2: Create a history table and put all archive data on DB and you need to validate the same in this table

So first you need to clear the data from the current data and later on, you need to check from the historical or archive table to get a unique record from it,

Case 3: for reference you can store the Queue reference code in your database

@samantha_shah I hope it will solve your issue

hi @mukesh.singh thanks for the reply

however i think my question has been confusing , let me Rectifiy my question again

  1. my first job is to mark “is duplicate " in the status column for each duplicate rows a particular row currentRowItem has in rExcel file
    so now my excel should have to → modify the duplicate columns as " is duplicate” in the status column.

  2. second task: I have to push that current item to my queueitem where , unique reference id is already set by me
    So I just would need to modfy my unique refernce is on property in queue to avoid the duplicacy problem

My Approach: I have figured out a way to store duplicates items in a dtDuplicate variable using Linq Query ,
but however I have to write the status in another column named “Status” and not in the “id” column which i have used to find out my duplicates row.

so I have used a for each activity which will run inside the list of duplicate items ,
but I am not sure how to update the status column for those duplicate items (which writes it down to my original excel file ) ???

can you guide how i would i solve the scenario ??
@Anil_G /@mukesh.singh
thanks

Below steps you need to flow

Step 1. Find the Duplicate record from the data for this you need to get the duplicate record
Step 2. Find the row index of the duplicate record index by given output steps 1 for each need to be use [ int rowIndex = rExcel.Rows.IndexOf(duplicateRow);
]
Step 3. Write the cell is duplicate in your Excel Sheet

This will solve the issue of
"

  1. my first job is to mark “is duplicate " in the status column for each duplicate rows a particular row currentRowItem has in rExcel file so now my excel should have to → modify the duplicate columns as " is duplicate” in the status column.

" I hope this will solve the issue

Here’s a refined approach to address duplicate marking and queue item creation in UiPath, incorporating insights from the feedback and addressing potential issues:

1. Identify Duplicates Using LINQ:

  • Employ a DataTable.AsEnumerable() approach to avoid modifying the original rExcel directly within the LINQ query. This ensures data integrity.
  • Use the SequenceEqual method to compare rows based on relevant columns (excluding the Status column to avoid circular dependencies).
// Assuming rExcel is your original DataTable, idCol is the column for unique IDs, and otherCols are additional columns for comparison
var duplicateRows = (from row in rExcel.AsEnumerable()
                     where rExcel.AsEnumerable().Any(otherRow => row[idCol].Equals(otherRow[idCol]) && !row.SequenceEqual(otherRow, r => r != "Status"))
                     select row).ToList();

2. Update “Status” for Duplicates:

  • Iterate through the duplicateRows list outside the LINQ query.
  • Use a For Each Row activity to loop through each duplicate row.
  • Access the row index (rowIndex) within the loop.
  • Use Write Cell activity for each duplicate row, targeting the correct rowIndex and the “Status” column:
foreach (DataRow duplicateRow in duplicateRows)
{
    int rowIndex = rExcel.Rows.IndexOf(duplicateRow);
    Write Cell(excelFilePath, "Status", rowIndex + 1, "is duplicate"); // +1 for zero-based indexing
}

3. Create Unique Queue Items:

  • Outside the duplicate handling loop, iterate through the original rExcel (excluding duplicates) using another For Each Row activity.
  • Extract the unique reference ID from the current row using row[idCol].
  • Create a new queue item using Add Queue Item activity, populating the necessary properties (including the unique reference ID).

4. Error Handling (Optional):

  • Consider implementing a Try Catch block around the Write Cell activity to gracefully handle potential Excel manipulation errors.

Complete Workflow (Pseudocode):

// Read rExcel data as DataTable
DataTable rExcel = Read Range(excelFilePath)

// Identify duplicate rows using LINQ (excluding Status column)
List<DataRow> duplicateRows = (from row in rExcel.AsEnumerable()
                               where rExcel.AsEnumerable().Any(otherRow => row["idCol"].Equals(otherRow["idCol"]) && !row.SequenceEqual(otherRow, r => r != "Status"))
                               select row).ToList();

// Update "Status" for each duplicate row
foreach (DataRow duplicateRow in duplicateRows)
{
    int rowIndex = rExcel.Rows.IndexOf(duplicateRow);
    Try
    {
        Write Cell(excelFilePath, "Status", rowIndex + 1, "is duplicate");
    }
    Catch (Exception ex)
    {
        Log Message("Error updating status: " + ex.Message);
    }
}

// Create unique queue items (excluding duplicates)
foreach (DataRow row in rExcel.Rows)
{
    if (!duplicateRows.Contains(row)) // Check if not a duplicate
    {
        string uniqueID = row["idCol"].ToString();
        Add Queue Item(uniqueID, otherQueueItemProperties...);
    }
}

Remember to replace placeholder names like excelFilePath, idCol, otherCols, and queue item properties with your actual values. This refined approach ensures separation of concerns, avoids modifying the original data within the LINQ query, and provides optional error handling for robustness.

@samantha_shah

Please try this

  1. Add a column to the data you read to add the status
  2. Create a string variable for storing ids which are already added and initialize it with string.Empty
  3. Add an if condition inside loop with condition stringvariable.contains(currentrow("idcolumn").ToString)
  4. On the then side use assign to add ducplicate status currentrow("status") = "duplicate"
  5. On the else side use add queue item and use your normal activities whicha re already there…also use assign with stringvariable = stringvariable +","+currentrow("id").ToStringn
  6. After loop use write range to write the data back with status

Cheers

Hi @mukesh.singh thankyou for your reply and deep explanation

how ever during my exceution fo 2nd step;

I am unable to find for each row activivty and also i am not able to use the query because Rows is not a memebr of List of rows and also , where i have to declare duplicate row ?? and what value should be store in that variable ?? iorder to assign rowindex

is there any concept i am not clear or am i missing anything here ?? @Anil_G @mukesh.singh

@samantha_shah

Can you follow the steps mentiones above please

All the formula rowindex etcare not needed

Cheers

1 Like

Read the data from your Excel file into the manager and save it into some kind of data structure that allows you to quickly and efficiently check for duplicates. For example, you can use a dictionary or a set. Before adding each element to the queue, check if it is already present in your data structure. If the item already exists, mark it as “duplicate” and skip adding it to the queue.
If the element is not in the data structure, add it to it and submit it to Orchestrator.

1 Like

Hi @Nessundorma i tried using and storing the read data into dictionary but since there are duplicate rows , and the keys are unique it wont allow us to add duplicate keys inside the dictionary

Hi @Anil_G I tried to follow your steps however , when the comparision takes places , everything goesinto duplicate rows , the rows are not getting added from the else part

this is step 2: Where it stores the values of column id to isAdded List(String) type variable.

this is step 3: where the if condition checks the values of variable-> “isAdded” and currentRow(“id”)

But However the values are always matching to each other ?? , dont know whats wrong in the If condition here

can you guide me ?? @Anil_G

@samantha_shah

The mistake is your isadded variable having all the ids already because of the assign before for loop activity…remove that assign activity

And use append item to list to add the current item in the else condition …so that only the used items are added

Check the step 5 above…I used a string variable even a list can be used but you should not add all the items to list upfront you need to add eqch item only when the item is added to queue…

Also remember to initialize the variable outside the loop New list(of string)

Cheers

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.