dotnet · antoniovs1029 · Jan 27, 2020 · Jan 17, 2020 · Jan 17, 2020 · Jan 17, 2020
diff --git a/...icrosoft.ML.DataView/DataViewRowCursor.md → docs/code/DataViewRowCursor.md b/...icrosoft.ML.DataView/DataViewRowCursor.md → docs/code/DataViewRowCursor.md
diff --git a/docs/code/IDataViewDesignPrinciples.md b/docs/code/IDataViewDesignPrinciples.md
@@ -459,15 +459,15 @@ the IDataView system is similar to the LINQ eco-system. The comparisons below
 refer to the `IDataView` and `IEnumerable<T>` interfaces as the core
 interfaces of their respective worlds.
 
-In both worlds, there is a cursoring interface associated with the core
+In both worlds, there is a cursoring mechanism associated with the core
 interface. In the IEnumerable world, the cursoring interface is
-`IEnumerator<T>`. In the IDataView world, the cursoring interface is
-`IRowCursor`.
+`IEnumerator<T>`. In the IDataView world, the cursoring mechanism is accomplished through a
+`DataViewRowCursor`.
 
-Both cursoring interfaces have `MoveNext()` methods for forward-only iteration
+Both cursoring mechanisms have `MoveNext()` methods for forward-only iteration
 through the elements.
 
-Both cursoring interfaces provide access to information about the current
+Both cursoring mechanisms provide access to information about the current
 item. For the IEnumerable world, the access is through the `Current` property
 of the enumerator. Note that when `T` is a class type, this suggests that each
 item served requires memory allocation. In the IDataView world, there is no
@@ -476,7 +476,7 @@ current row are directly accessible via methods on the cursor. This avoids
 memory allocation for each row.
 
 In both worlds, the item type information is carried by both the core
-interface and the cursoring interface. In the IEnumerable world, this type
+interface and the cursoring mechanism. In the IEnumerable world, this type
 information is part of the .Net type, while in the IDataView world, the type
 information is much richer and contained in the schema, rather than in the
 .Net type.

diff --git a/docs/code/IDataViewImplementation.md b/docs/code/IDataViewImplementation.md
@@ -144,12 +144,13 @@ to make it a loader or a transform. If not, it probably does not make sense.
 Let us address something fairly conspicuous. The question almost everyone
 asks, when they first start using `IDataView`: what is up with these getters?
 
-One does not fetch values directly from an `IRow` implementation (including
-`IRowCursor`). Rather, one retains a delegate that can be used to fetch
-objects, through the `GetGetter` method on `IRow`. This delegate is:
+One does not fetch values directly from a `DataViewRow` implementation (including
+`DataViewRowCursor`). Rather, one retains a delegate that can be used to fetch
+objects, through the `GetGetter` method on `DataViewRow`. This delegate is:
 
 ```csharp
 public delegate void ValueGetter<TValue>(ref TValue value);
+
 ```
 
 If you are unfamiliar with delegates, [read
@@ -159,7 +160,7 @@ method, and you use this delegate multiple times to fetch the actual column
 values as you `MoveNext` through the cursor.
 
 Some history to motivate this: In the first version of `IDataView` the
-`IRowCursor` implementation did not actually have these "getters" but rather
+`DataViewRowCursor` implementation (formerly known as `IRowCursor`) did not actually have these "getters" but rather
 var cursors = new IRowCursor[prList.Count]; 
 var cursors = new IRowCursor[prList.Count]; 
 had a method, `GetColumnValue<TValue>(int col, ref TValue val)`. However, this
 has the following problems:
 
@@ -191,7 +192,7 @@ values for the same columns, it will apparently be a "consistent" view. It is
 probably obvious what this mean, but specifically:
 
 The cursor as returned through `GetRowCursor` (with perhaps an identically
-constructed `IRandom` instance) in any iteration should return the same number
+constructed `System.Random` instance) in any iteration should return the same number
 of rows on all calls, and with the same values at each row.
 
 Why is this important? Many machine learning algorithms require multiple
@@ -203,7 +204,7 @@ are computed were not consistent? How could a dual algorithm like SDCA
 function with any accuracy, if the examples associated with any given dual
 variable were to change? Consider even a relatively simple transform, like a
 forward looking windowed averager, or anything relating to time series. The
-implementation of those `ICursor` interfaces often open *two* cursors on the
+implementation of those `DataViewRowCursor` interfaces often open *two* cursors on the
 underlying `IDataView`, one "look ahead" cursor used to gather and calculate
 necessary statistics, and another cursor for any data: how could the column
 constructed out of that transform be meaningful of the look ahead cursor was
@@ -249,7 +250,7 @@ data in a consistent way.
 Let us formalize this somewhat. We consider two data views to be functionally
 identical if there is absolutely no way to distinguish them: they return the
 same values, have the same types, same number of rows, they shuffle
-identically given identically constructed `IRandom` when row cursors are
+identically given identically constructed `System.Random` when row cursors are
 constructed, return the same ID for rows from the ID getter, etc. Obviously
 this concept is transitive. (Of course, `Batch` in a cursor might be different
 between the two, but that is the case even with two cursors constructed on the
@@ -348,7 +349,7 @@ feature names are, etc.) when all we have is the data model. (For example, the
 
 # Getters Must Fail for Invalid Types
 
-For a given `IRow`, we must expect that `GetGetter<TValue>(col)` will throw if
+For a given `DataViewRow`, we must expect that `GetGetter<TValue>(col)` will throw if
 either `IsColumnActive(col)` is `false`, or `typeof(TValue) !=
 Schema.GetColumnType(col).RawType`, as indicated in the code documentation.
 But why? It might seem reasonable to add seemingly "harmless" flexibility to
@@ -383,15 +384,15 @@ inconsistency, surprises and bugs for users and developers.
 
 # Thread Safety
 
-Any `IDataView` implementation, as well as the `ISchema`, *must* be thread
+Any `IDataView` implementation, as well as the `DataViewSchema`, *must* be thread
 safe. There is a lot of code that depends on this. For example, cross
 validation works by operating over the same dataset (just, of course, filtered
 to different subsets of the data). That amounts to multiple cursors being
 opened, simultaneously, over the same data.
 
-So: `IDataView` and `ISchema` must be thread safe. However, `IRowCursor`,
+So: `IDataView` and `DataViewSchema` must be thread safe. However, `DataViewRowCursor`,
 being a stateful object, we assume is accessed from exactly one thread at a
-time. The `IRowCursor`s returned through a `GetRowCursorSet`, however, which
+time. The `DataViewRowCursor`s returned through a `GetRowCursorSet`, however, which
 each single one must be accessed by a single thread at a time, multiple
 threads can access this set of cursors simultaneously: that's why we have that
 method in the first place.
@@ -431,10 +432,10 @@ not have been obvious immediately.
 
 # `GetGetter` Returning the Same Delegate
 
-On a single instance of `IRowCursor`, since each `IRowCursor` instance has no
+On a single instance of `DataViewRowCursor`, since each `DataViewRowCursor` instance has no
 requirement to be thread safe, it is entirely legal for a call to `GetGetter`
 on a single column to just return the same getting delegate. It has come to
-pass that the majority of implementations of `IRowCursor` actually do that,
+pass that the majority of implementations of `DataViewRowCursor` actually do that,
 since it is in some ways easier to write the code that way.
 
 This practice has inadvertently enabled a fairly attractive tool for analysis
@@ -447,29 +448,12 @@ do not, but the vast majority do.
 # Class Structuring
 
 The essential attendant classes of an `IDataView` are its schema, as returned
-through the `Schema` property, as well as the `IRowCursor` implementation(s),
+through the `Schema` property, as well as the `DataViewRowCursor` implementation(s),
 as returned through the `GetRowCursor` and `GetRowCursorSet` methods. The
 implementations for those two interfaces are typically nested within the
 `IDataView` implementation itself. The cursor implementation is almost always
 at the bottom of the data view class.
 
-# `IRow` and `ICursor` vs. `IRowCursor`
-
-We have `IRowCursor` which descends from both `IRow` and `ICursor`. Why do
-these other interfaces exist?
-
-Firstly, there are implementations of `IRow` or `ICursor` that are not
-`IRowCursor`s. We have occasionally found it useful to have something
-resembling a key-value store, but that is strongly, dynamically typed in some
-fashion. Why not simply represent this using the same idioms of `IDataView`?
-So we put them in an `IRow`. Similarly: we have several things that behave
-*like* cursors, but that are in no way *row* cursors.
-
-However, more than that, there are a number of utility functions where we want
-to operate over something like an `IRowCursor`, but we want to have some
-indication that this function will not move the cursor (in which case `IRow`
-is helpful), or that will not access any values (in which case `ICursor` is
-helpful).
 
 # Schema
 
@@ -485,8 +469,8 @@ schema's `TryGetColumnIndex`.
 
 Regarding name hiding, the principles mention that when multiple columns have
 the same name, other columns are "hidden." The convention all implementations
-of `ISchema` obey is that the column with the *largest* index. Note however
-that this is merely convention, not part of the definition of `ISchema`.
+of `DataViewSchema` obey is that the column with the *largest* index. Note however
+that this is merely convention, not part of the definition of `DataViewSchema`.
 
 Implementations of `TryGetColumnIndex` should be O(1), that is, practically,
 this mapping ought to be backed with a dictionary in most cases. (There are

diff --git a/docs/code/MlNetHighLevelConcepts.md b/docs/code/MlNetHighLevelConcepts.md
@@ -13,9 +13,9 @@ This document is going to cover the following ML.NET concepts:
   - In one sentence, a transformer is a component that takes data, does some work on it, and returns new 'transformed' data.
   - For example, you can think of a machine learning model as a transformer that takes features and returns predictions.
   - Another example, 'text tokenizer' would take a single text column and output a vector column with individual 'words' extracted out of the texts.
-- [*Data reader*](#data-reader), represented as an `IDataReader<T>` interface.
-  - The data reader is ML.NET component to 'create' data: it takes an instance of `T` and returns data out of it. 
-  - For example, a *TextLoader* is an `IDataReader<FileSource>`: it takes the file source and produces data. 
+- [*Data loader*](#data-loader), represented as an `IDataLoader<TSource>` interface.
+  - The data loader is ML.NET component to 'create' data: it takes an instance of `TSource` and returns data out of it. 
+  - For example, a *TextLoader* is an `IDataLoader<IMultiStreamSource>`: it takes the file source and produces data. 
 - [*Estimator*](#estimator), represented as an `IEstimator<T>` interface.
   - This is an object that learns from data. The result of the learning is a *transformer*.
   - You can think of a machine learning *algorithm* as an estimator that learns on data and produces a machine learning *model* (which is a transformer).
@@ -28,7 +28,7 @@ This document is going to cover the following ML.NET concepts:
 
 In ML.NET, data is very similar to a SQL view: it's a lazily-evaluated, cursorable, heterogenous, schematized dataset.
 
-- It has *Schema* (an instance of an `ISchema` interface), that contains the information about the data view's columns.
+- It has *Schema* (an instance of a `DataViewSchema` class), that contains the information about the data view's columns.
   - Each column has a *Name*, a *Type*, and an arbitrary set of *annotations* associated with it.
   - It is important to note that one of the types is the `vector<T, N>` type, which means that the column's values are *vectors of items of type T, with the size of N*. This is a recommended way to represent multi-dimensional data associated with every row, like pixels in an image, or tokens in a text.
   - The column's *annotations* contains information like 'slot names' of a vector column and suchlike. The annotations itself are actually represented as another one-row *data*, that is unique to each column.
@@ -40,12 +40,12 @@ In ML.NET, data is very similar to a SQL view: it's a lazily-evaluated, cursorab
 
 A transformer is a component that takes data, does some work on it, and return new 'transformed' data.
 
-Here's the interface of `ITransformer`:
+Here's part of the `ITransformer` interface:
 ```c#
 public interface ITransformer
 {
     IDataView Transform(IDataView input);
-    ISchema GetOutputSchema(ISchema inputSchema);
+    DataViewSchema GetOutputSchema(DataViewSchema inputSchema);
 }
 ```
 
@@ -73,26 +73,26 @@ var fullTransformer = transformer1.Append(transformer2).Append(transformer3);
 
 We utilize this property a lot in ML.NET: typically, the trained ML.NET model is a 'chain of transformers', which is, for all intents and purposes, a *transformer*. 
 
-## Data reader
+## Data loader
 
-The data reader is ML.NET component to 'create' data: it takes an instance of `T` and returns data out of it. 
+The data loader is ML.NET component to 'create' data: it takes an instance of `TSource` and returns data out of it. 
 
-Here's the exact interface of `IDataReader<T>`:
+Here's the interface of `IDataLoader<TSource>`:
 ```c#
 public interface IDataReader<in TSource>
 {
-    IDataView Read(TSource input);
-    ISchema GetOutputSchema();
+    IDataView Load(TSource input);
+    DataViewSchema GetOutputSchema();
 }
 ```
-As you can see, the reader is capable of reading data (potentially multiple times, and from different 'inputs'), but the resulting data will always have the same schema, denoted by `GetOutputSchema`.
+As you can see, the loader is capable of loading data (potentially multiple times, and from different 'inputs'), but the resulting data will always have the same schema, denoted by `GetOutputSchema`.
 
-An interesting property to note is that you can create a new data reader by 'attaching' a transformer to an existing data reader. This way you can have 'reader' with transformation behavior baked in:
+An interesting property to note is that you can create a new data loader by 'attaching' a transformer to an existing data loader. This way you can have a 'loader' with transformation behavior baked in:
 ```c#
-var newReader = reader.Append(transformer1).Append(transformer2)
+var newLoader = loader.Append(transformer1).Append(transformer2)
 ```
 
-Another similarity to transformers is that, since data is lazily evaluated, *readers are lazy*: no (or minimal) actual 'reading' happens when you call `dataReader.Read()`: only when a cursor is requested on the resulting data does the reader begin to work.
+Another similarity to transformers is that, since data is lazily evaluated, *loaders are lazy*: no (or minimal) actual 'loading' happens when you call `dataLoader.Load()`: only when a cursor is requested on the resulting data does the loader begin to work.
 
 ## Estimator
 

diff --git a/docs/code/SchemaComprehension.md b/docs/code/SchemaComprehension.md
@@ -8,9 +8,9 @@ For a better understanding of `IDataView` principles and type system please refe
 
 ## Introduction
 
-Every dataset in ML.NET is represented as an `IDataView`, which is, for the purposes of this document, a collection of rows that share the same columns. The set of columns, their names, types and other annotations is known as the *schema* of the `IDataView`, and it's represented as an `ISchema` object.
+Every dataset in ML.NET is represented as an `IDataView`, which is, for the purposes of this document, a collection of rows that share the same columns. The set of columns, their names, types and other annotations is known as the *schema* of the `IDataView`, and it's represented as an `DataViewSchema` object.
 
-In this document, we will be using the terms *data view* and `IDataView` interchangeably, same for *schema* and `ISchema`.
+In this document, we will be using the terms *data view* and `IDataView` interchangeably, same for *schema* and `DataViewSchema`.
 
 Before any new data enters ML.NET, the user needs to somehow define how the schema of the data will look like.
 To do this, the following questions need to be answered:

diff --git a/docs/specs/mlnet-database-loader/mlnet-database-loader-specs.md b/docs/specs/mlnet-database-loader/mlnet-database-loader-specs.md
@@ -183,7 +183,7 @@ MLContext mlContext = new MLContext();
 IDataView trainingDataView = mlContext.Data.LoadFromDbSqlQuery<ModelInputData, SqlConnection>(connString: myConnString, sqlQuerySentence: "Select * from InputMLModelDataset where InputMLModelDataset.CompanyName = 'MSFT'"); 
 ```
 
-**2. (Foundational method) Data loading from a database with an IDataReader object:**
+**2. (Foundational method) Data loading from a database with a System.Data.IDataReader object:**
 
 This is the foundational or pillar method which will be used by the rest of the higher level or convenient methods:
 

diff --git a/src/Microsoft.ML.AutoML/Sweepers/ISweeper.cs b/src/Microsoft.ML.AutoML/Sweepers/ISweeper.cs
@@ -230,7 +230,7 @@ IComparable IRunResult.MetricValue
 
     /// <summary>
     /// The metric class, used by smart sweeping algorithms.
-    /// Ideally we would like to move towards the new IDataView/ISchematized, this is
+    /// Ideally we would like to move towards a IDataView, this is
     /// just a simple view instead, and it is decoupled from RunResult so we can move
     /// in that direction in the future.
     /// </summary>

diff --git a/src/Microsoft.ML.Core/Data/ISchemaBindableMapper.cs b/src/Microsoft.ML.Core/Data/ISchemaBindableMapper.cs
@@ -9,11 +9,11 @@
 namespace Microsoft.ML.Data
 {
     /// <summary>
-    /// A mapper that can be bound to a <see cref="RoleMappedSchema"/> (which is an ISchema, with mappings from column kinds
-    /// to columns). Binding an <see cref="ISchemaBindableMapper"/> to a <see cref="RoleMappedSchema"/> produces an
+    /// A mapper that can be bound to a <see cref="RoleMappedSchema"/> (which encapsulates a <see cref="DataViewSchema"/> and has mappings from column kinds
+    /// to columns of that schema). Binding an <see cref="ISchemaBindableMapper"/> to a <see cref="RoleMappedSchema"/> produces an
     /// <see cref="ISchemaBoundMapper"/>, which is an interface that has methods to return the names and indices of the input columns
     /// needed by the mapper to compute its output. The <see cref="ISchemaBoundRowMapper"/> is an extention to this interface, that
-    /// can also produce an output IRow given an input IRow. The IRow produced generally contains only the output columns of the mapper, and not
+    /// can also produce an output <see cref="DataViewRow"/> given an input <see cref="DataViewRow"/>. The <see cref="DataViewRow"/> produced generally contains only the output columns of the mapper, and not
     /// the input columns (but there is nothing preventing an <see cref="ISchemaBoundRowMapper"/> from mapping input columns directly to outputs).
     /// This interface is implemented by wrappers of IValueMapper based predictors, which are predictors that take a single
     /// features column. New predictors can implement <see cref="ISchemaBindableMapper"/> directly. Implementing <see cref="ISchemaBindableMapper"/>

diff --git a/src/Microsoft.ML.Core/Data/RoleMappedSchema.cs b/src/Microsoft.ML.Core/Data/RoleMappedSchema.cs
@@ -9,7 +9,7 @@
 namespace Microsoft.ML.Data
 {
     /// <summary>
-    /// Encapsulates an <see cref="Schema"/> plus column role mapping information. The purpose of role mappings is to
+    /// Encapsulates a <see cref="DataViewSchema"/> plus column role mapping information. The purpose of role mappings is to
     /// provide information on what the intended usage is for. That is: while a given data view may have a column named
     /// "Features", by itself that is insufficient: the trainer must be fed a role mapping that says that the role
     /// mapping for features is filled by that "Features" column. This allows things like columns not named "Features"
@@ -25,7 +25,7 @@ namespace Microsoft.ML.Data
     /// in this schema.
     /// </summary>
     /// <remarks>
-    /// Note that instances of this class are, like instances of <see cref="Schema"/>, immutable.
+    /// Note that instances of this class are, like instances of <see cref="DataViewSchema"/>, immutable.
     ///
     /// It is often the case that one wishes to bundle the actual data with the role mappings, not just the schema. For
     /// that case, please use the <see cref="RoleMappedData"/> class.