Provide a way to append\concatentate multiple IDataViews

### System information

- ML.NET - 1.2.0: 

### Issue

There should be a way to append or concatenate multiple IDataViews together.

Here's the scenario:
The [new ranking sample](https://github.com/dotnet/machinelearning-samples/pull/549) needs the ability to train the model using two datasets that are each loaded from a separate text file and have the same schema - specifically, there is a (1) Training dataset and (2) Validation dataset, that need to be combined.  For example, refer to step #3 in the steps outlined below which the sample is based on.

Here's the steps shown in the sample - generally, the pattern to train, validate, and test a model includes the following steps:
1. The model is trained on the **training** dataset.  The model's metrics are then evaluated using the **validation** dataset.
2. Step #1 is repeated by retraining and reevaluating the model until the desired metrics are achieved.  The outcome of this step is a pipeline that applies the necessary data transformations and trainer.
3. The pipeline is used to train on the combined **training** + **validation** datasets.  The model's metrics are then evaluated on the **testing** dataset (exactly once) -- this is the final set of metrics used to measure the model's quality.
4. The final step is to retrain the pipeline on **all** of the combined **training** + **validation** +  **testing** datasets.  This model is then ready to be deployed into production.

Today to achieve this, the sample has to first load the data from a text file, then create an enumerable so that the datasets can be concatenated - this process would be greatly simplified if you could append/concatenate two IDataViews together:

```

//Load training data (has a header)
IDataView trainData = mlContext.Data.LoadFromTextFile<SearchResultData>(TrainDatasetPath, separatorChar: '\t', hasHeader: true);

//Load validation data (has a header)
IDataView validationData = mlContext.Data.LoadFromTextFile<SearchResultData>(ValidationDatasetPath, separatorChar: '\t', hasHeader: false);

// Combine the training and validation datasets.
var validationDataEnum = mlContext.Data.CreateEnumerable<SearchResultData>(validationData, false);
var trainDataEnum = mlContext.Data.CreateEnumerable<SearchResultData>(trainData, false);
var trainValidationDataEnum = validationDataEnum.Concat<SearchResultData>(trainDataEnum);
IDataView trainValidationData = mlContext.Data.LoadFromEnumerable<SearchResultData>(trainValidationDataEnum);
```

NOTE: I also considered creating a text loader to load multiple text files (as described [here])(https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.data.textloader.load?view=ml-dotnet#Microsoft_ML_Data_TextLoader_Load_Microsoft_ML_Data_IMultiStreamSource_); however, one of the data files included a header while the other didn't.  It looks like to create a TextLoader for multiple files, that the file headers must be consistent across files.

### Source code / logs

Note that there is a method today that provides the ability to append rows - we should consider exposing this publicly:

https://github.com/dotnet/machinelearning/blob/70ef7ecd43b031b481a4047ea361da5e2f360336/src/Microsoft.ML.Data/DataView/AppendRowsDataView.cs#L23-L31




	/// <summary>
	/// This class provides the functionality to combine multiple IDataView objects which share the same schema
	/// All sources must contain the same number of columns and their column names, sizes, and item types must match.
	/// The row count of the resulting IDataView will be the sum over that of each individual.
	///
	/// An AppendRowsDataView instance is shuffleable iff all of its sources are shuffleable and their row counts are known.
	/// </summary>
	[BestFriend]
	internal sealed class AppendRowsDataView : IDataView

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provide a way to append\concatentate multiple IDataViews #4005

System information

Issue

Source code / logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Provide a way to append\concatentate multiple IDataViews #4005

Description

System information

Issue

Source code / logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions