Hello,

Due to a high data volume, I need to split a data table into two data tables with the same structure (about the same proportions).

The first column of the data table named “Category”, as it says contains the category of the data.

In order to avoid having rows of the same category in the both data tables I need to split at the end of a category beyond the half of the initial data table (using LINQ preferably).

you can use

`fIRSTtABLE = dt.AsEnumerable.Take(Math.Round(dt.RowCount/2,0)).coPYtOdATATABLE`

`SecondTable = dt.AsEnumerable.Skip(Math.Round(dt.RowCount/2,0)).coPYtOdATATABLE`

Or you can use groupby as well

`dt.AsEnumerable.GroupBy(function(x) x("Category").ToString)`

- this gives groups of datatable can be used inf or each to get each table or get as dictionary as well

cheers

Hi @SONYC

Can you try like below

```
dt.AsEnumerable.GroupBy(Function (x) x("Category").ToString).Select(Function (x) x.CopyToDataTable).ToList
```

```
currentDataTable.Rows(0)("Category").ToString
```

Regards,

As I specified, I need to split the Data Table but I also want to avoid having the same category in the both Data Tables.

Example :

I have 9 rows so the first data table should contain 5 rows and the second should contain 4, but I don’t want to have the “BBB” category within the both data tables, I need to get all the rows until the category changes; so there should be 6 rows for the first data table and 3 rows for the secon one.

Thank you @Anil_G for your suggestion.

It works for a decent amount of data.

I’m talking here about a huge CSV file, that’s why I need to split the data, in order to store it into two or more Excel sheets.

Then just use group by and each group would be a separate datatable…paste each to separate sheet instead of only two

Cheers

Please assign the following code to two different tables as follow

DT1 = dt.AsEnumerable.Take(dt.AsEnumerable.toList.FindLastIndex(function(row) row(“Category”).ToString.Equals(“BBB”))+1).CopyToDataTable

DT2 = dt.AsEnumerable.Skip(dt.AsEnumerable.toList.FindLastIndex(function(row) row(“Category”).ToString.Equals(“BBB”))+1).CopyToDataTable

Thanks @Anjani_Kapoor

I’ll try to recap.

I’m talking about a huge CSV file containing about 1.1 M rows.

The data will be stored into DT0

I divide the total number of rows by 2, but I need to implement a condition for the second half, as I don’t know which will be the last category.

If : dataRowIndex < CInt(Math.Round(DT0.Rows.Count/2,0))

Then : copy dataRow in DT1

Else If : the value of the first column (of the second half) is the same as the last value of the first column (of the first half)

Then : copy dataRow in DT1

Else : copy dataRow in DT2

If the values under category column is in-order (sorted), then you may follow the following steps:

- Store the value of int_RowIndex = CInt(Math.Round(DT0.Rows.Count/2,0)) into a integer variable.
- find the category str_category = DT0(int_RowIndex+1)(“Category”)

Then use the logic to get DT1 and DT2:

DT1 = dt.AsEnumerable.Take(dt.AsEnumerable.toList.FindLastIndex(function(row) row(“Category”).ToString.Equals(str_category)).CopyToDataTable

DT2 = dt.AsEnumerable.Skip(dt.AsEnumerable.toList.FindLastIndex(function(row) row(“Category”).ToString.Equals(str_category)).CopyToDataTable

Whoever asked you to build an automation to import 1.1M rows into an Excel file must be drinking the kool-aid

You either need to do “Data Modelling” in Excel or even better, use Power BI. You build generic, consistent steps in the “Power Query” editor and use a folder/file combo as the source. This method can work hand in hand with whatever automation that produced the CSV file. Obviously I don’t know the final product use case, so my answer is limited to the info available in this thread.

Finally I managed to sort the DataTable and split the data within 2 CSV files using the logic that you suggested.

Thank you @Anjani_Kapoor

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.