Random Sample Dataset from Main Dataset (N random records per unique group)

I am working on developing a system health check tool to validate that 4 different systems are in sync. To do that, I need to create a sample dataset of random N number of records for every unique key/combination from a main data set everyday. All 4 systems will be checked for records from this sample dataset and any differences will be highlighted using conditional formatting.

For Example, I have a report that has 700 rows. Each unique combination of the 6 fields [Client-Contractor-Distribution Center-Service Level-Alert Value-Status] has 100 records. This part will be dynamic. There could be any number of unique combinations and any number of records per combination.
enter image description here
Let’s say I want 5 random records for each of the 7 combinations. Essentially, I need a way to get 35 records that are randomly selected, 5 per unique combination.
enter image description here

In summary, the requirement is:

  1. Group datarows by the values of the columns
    [Client-Contractor-Distribution Center-Service Level-Alert Value-Status]
  2. Pick 5 random records from each group
  3. Return these records as a datatable

Attached is the sample data with the input structure and the desired output.

SampleData.xlsx (36.2 KB)

Hi @Achal_Desai,

This question to me looks more like an assignment / challenge than a forum question.

  • What have you tried so far?
  • What challenges did you face?
  • What approaches did you already try?

In my opinion, expecting the forum members to solve the entire problem is a little too much to ask.

1 Like

Hi @jeevith , Apologies.
I am new to UiPath so not too familiar with the rules of writing topics.

I had got the solution half way across. And today I found the missing piece as well. Although it is in LINQ which I don’t understand at all. But the solution as a whole is working.

Below is the solution I have come up with.

As can be seen from the image above.
I am 1st creating a GroupKey Column in the input DT by concatenating the key columns.
Then replicating the input DT and creating a blank DT (Same structure but no data) to put the randomly selected records into.
After that, I create a new table to get all the unique group keys.
Then I run the For Each Row activity using the Unique group key DT to loop over all the unique group keys.
The Current Row Group Key Value is used to then filter the main input DT to get all the records within that group.
Here is the part I was missing, picking random records from this filtered DT. I found a solution from another post which is working. Although I don’t understand how it’s working because it looks to be in LINQ.
The picked records are then added to a DT that is merged (appended) to the Blank DT created in the beginning after each loop.

The question I now have is then is there a better/more efficient way to do this.
The current solution works. But I have tested it with a small dataset.
The actual dataset may have thousands of unique groups.
The process will have to run thousands of loops to achieve the final result taking a lot of time.

Hi @Achal_Desai,

Nothing better than you solving it yourself and learning loads while doing it. I am sure it was tough but fun.

Excellent to see you also share the solution in this thread. One last thing you can do is mark your last post as the solution as this will help other members having a similar issue.

I wish you goodluck in the future.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.