JuliaData · bkamins · Mar 19, 2020 · Jan 6, 2020 · Jan 6, 2020 · Jan 6, 2020
diff --git a/docs/src/lib/types.md b/docs/src/lib/types.md
@@ -114,6 +114,7 @@ without caution because:
 
 ```@docs
 AbstractDataFrame
+ByRow
 DataFrame
 DataFrameRow
 GroupedDataFrame
@@ -124,5 +125,4 @@ DataFrameRows
 DataFrameColumns
 RepeatedVector
 StackedVector
-Row
 ```
diff --git a/docs/src/man/getting_started.md b/docs/src/man/getting_started.md
@@ -522,11 +522,14 @@ julia> df[in.(df.A, Ref([1, 5, 601])), :]
 │ 3   │ 601   │ 7     │ 301   │
 ```
 
-Equivalently, the `in` function can be called with a single argument to create a function object that tests whether each value belongs to the subset (partial application of `in`): `df[in([1, 5, 601]).(df.A), :]`.
+Equivalently, the `in` function can be called with a single argument to create
+a function object that tests whether each value belongs to the subset
+(partial application of `in`): `df[in([1, 5, 601]).(df.A), :]`.
 
 #### Column selection using `select` and `select!`
 
-You can also use the [`select`](@ref) and [`select!`](@ref) functions to select columns in a data frame.
+You can also use the [`select`](@ref) and [`select!`](@ref) functions to select,
+rename and transform columns in a data frame.
 
 The `select` function creates a new data frame:
 ```jldoctest dataframe
@@ -550,6 +553,27 @@ julia> select(df, r"x") # select columns containing 'x' character
 │     │ Int64 │ Int64 │
 ├─────┼───────┼───────┤
 │ 1   │ 1     │ 2     │
+
+julia> select(df, :x1 => :a1, :x2 => :a2) # rename columns
+1×2 DataFrame
+│ Row │ a1    │ a2    │
+│     │ Int64 │ Int64 │
+├─────┼───────┼───────┤
+│ 1   │ 1     │ 2     │
+
+julia> select(df, :x1, :x2 => (x -> 2x) => :x2) # transform columns
+1×2 DataFrame
+│ Row │ x1    │ x2    │
+│     │ Int64 │ Int64 │
+├─────┼───────┼───────┤
+│ 1   │ 1     │ 4     │
+
+julia> select(df, :x1, :x2 => ByRow(UInt8) => :x2) # transform columns by row
+1×2 DataFrame
+│ Row │ x1    │ x2    │
+│     │ Int64 │ UInt8 │
+├─────┼───────┼───────┤
+│ 1   │ 1     │ 0x02  │
 ```
 
 It is important to note that `select` always returns a data frame,

diff --git a/src/abstractdataframe/selection.jl b/src/abstractdataframe/selection.jl
@@ -1,9 +1,8 @@
 # TODO:
 # * add transform and transfom! functions
-# * update documentation
-# * add tests
 # * add NT (or better name) to column selector passing NamedTuple
 #   (also in other places: filter, combine)
+# * add select/select!/transform/transform! for GroupedDataFrame
 
 # normalize_selection function makes sure that whatever input format of idx is it
 # will end up in one of four canonical forms
@@ -134,32 +133,32 @@ In particular, regular expressions, `All`, `Between`, and `Not` selectors are su
 
 Columns can be renamed using the `old_column => new_column_name` syntax,
 and transformed using the `old_column => fun => new_column_name` syntax.
-`new_column_name` must be a `Symbol`, and `fun` a function or a type.
-If `old_column` is a `Symbol` or an integer then `fun` is applied to the corresponding column vector.
+`new_column_name` must be a `Symbol`, and `fun` a function or a type. If `old_column`
+is a `Symbol` or an integer then `fun` is applied to the corresponding column vector.
 Otherwise `old_column` can be any column indexing syntax, in which case `fun`
 will be passed the column vectors specified by `old_column` as separate arguments.
 
-To apply `fun` to each row instead of whole columns, it can be wrapped in a `ByRow` struct. In this case
-if `old_column` is a `Symbol` or an integer then `fun` is applied to each element
-(row) of `old_column`. Otherwise `old_column` can be any column indexing syntax,
-in which case `fun` will be passed one argument for each of the columns specified by `old_column`.
-If `ByRow` is used it is not allowed
-that `old_column` selects an empty set of columns.
+To apply `fun` to each row instead of whole columns, it can be wrapped in a `ByRow`
+struct. In this case if `old_column` is a `Symbol` or an integer then `fun` is applied
+to each element (row) of `old_column`. Otherwise `old_column` can be any column
+indexing syntax, in which case `fun` will be passed one argument for each of the
+columns specified by `old_column`. If `ByRow` is used it is not allowed that
+`old_column` selects an empty set of columns.
 
 Column transformation can also be specified using the short `old_column => fun` form.
 In this case, `new_column_name` is automatically generated as `\$(old_column)_\$(fun)`.
 Up to three column names are used for multiple input columns and they are joined
 using `_`; if more than three columns are passed then the name consists of the
 first two names and `etc` suffix then, e.g. `[:a,:b,:c,:d] => fun` produces
-the new column name `a_b_etc_fun`.
+the new column name `:a_b_etc_fun`.
 
 If a collection of column names is passed to `select!` then requesting duplicate column
 names in target data frame are accepted (e.g. `select!(df, [:a], :, r"a")` is allowed)
 and only the first occurrence is used. In particular a syntax to move column `:col`
 to the first position in the data frame is `select!(df, :col, :)`.
 On the contrary, output column names of renaming, transformation and single column
 selection operations must be unique, so e.g. `select!(df, :a, :a => :a)` or
-`select!(df, :a, :a => sin => :a)` are not allowed.
+`select!(df, :a, :a => ByRow(sin) => :a)` are not allowed.
 
 Note that including the same column several times in the data frame via renaming
 when `copycols=false` will create column aliases. An example of such a situation is
@@ -260,8 +259,7 @@ end
 """
     select(df::AbstractDataFrame, inds...; copycols::Bool=true)
 
-Create a new data frame that contains columns from `df`
-specified by `inds` and return it.
+Create a new data frame that contains columns from `df` specified by `inds` and return it.
 
 Arguments passed as `inds...` can be any index that is allowed for column indexing.
 In particular, regular expressions, `All`, `Between`, and `Not` selectors are supported.
@@ -271,36 +269,36 @@ are supported.
 
 Columns can be renamed using the `old_column => new_column_name` syntax,
 and transformed using the `old_column => fun => new_column_name` syntax.
-`new_column_name` must be a `Symbol`, and `fun` a function or a type.
-If `old_column` is a `Symbol` or an integer then `fun` is applied to a column `old_column`.
-Otherwise `old_column` can be any column indexing syntax, but in this case `fun`
-will be passed a `NamedTuple` holding only the columns specified by `old_column`.
-
-It is allowed to wrap `fun` in `ByRow` struct. In this case
-if `old_column` is a `Symbol` or an integer then `fun` is applied to each element
-(row) of `old_column`. Otherwise `old_column` can be any column indexing syntax,
-but in this case `fun` will be passed a `NamedTuple` representing each row, holding only
-the columns specified by `old_column`. If `ByRow` is used it is not allowed
-that `old_column` selects an empty set of columns.
+`new_column_name` must be a `Symbol`, and `fun` a function or a type. If `old_column`
+is a `Symbol` or an integer then `fun` is applied to the corresponding column vector.
+Otherwise `old_column` can be any column indexing syntax, in which case `fun`
+will be passed the column vectors specified by `old_column` as separate arguments.
+
+To apply `fun` to each row instead of whole columns, it can be wrapped in a `ByRow`
+struct. In this case if `old_column` is a `Symbol` or an integer then `fun` is applied
+to each element (row) of `old_column`. Otherwise `old_column` can be any column
+indexing syntax, in which case `fun` will be passed one argument for each of the
+columns specified by `old_column`. If `ByRow` is used it is not allowed that
+`old_column` selects an empty set of columns.
 
 Column transformation can also be specified using the short `old_column => fun` form.
 In this case, `new_column_name` is automatically generated as `\$(old_column)_\$(fun)`.
 Up to three column names are used for multiple input columns and they are joined
 using `_`; if more than three columns are passed then the name consists of the
 first two names and `etc` suffix then, e.g. `[:a,:b,:c,:d] => fun` produces
-the new column name `a_b_etc_fun`.
+the new column name `:a_b_etc_fun`.
 
-If a collection of column names is passed to `select` then requesting duplicate column
-names in target data frame are accepted (e.g. `select(df, [:a], :, r"a")` is allowed)
+If a collection of column names is passed to `select!` then requesting duplicate column
+names in target data frame are accepted (e.g. `select!(df, [:a], :, r"a")` is allowed)
 and only the first occurrence is used. In particular a syntax to move column `:col`
-to the first position in the data frame is `select(df, :col, :)`.
+to the first position in the data frame is `select!(df, :col, :)`.
 On the contrary, output column names of renaming, transformation and single column
-selection operations must be unique, so e.g. `select(df, :a, :a => :a)` or
-`select(df, :a, :a => sin => :a)` are not allowed.
+selection operations must be unique, so e.g. `select!(df, :a, :a => :a)` or
+`select!(df, :a, :a => ByRow(sin) => :a)` are not allowed.
 
-If `df` is a `DataFrame` a new `DataFrame` is returned.
-If `copycols=true` (the default), then returned `DataFrame` is guaranteed not to share columns with `df`.
-If `copycols=false`, then returned `DataFrame` shares column vectors with `df` where possible.
+If `df` is a `DataFrame` a new `DataFrame` is returned. If `copycols=true` (the default),
+then returned `DataFrame` is guaranteed not to share columns with `df`. If
+`copycols=false`, then returned `DataFrame` shares column vectors with `df` where possible.
 
 If `df` is a `SubDataFrame` then a `SubDataFrame` is returned if `copycols=false`
 and a `DataFrame` with freshly allocated columns otherwise.
@@ -385,11 +383,14 @@ function _select(df::AbstractDataFrame, normalized_cs, copycols::Bool)
     # the role of transformed_cols is the following
     # * make sure that we do not use the same target column name twice in transformations;
     #   note though that it can appear in no-transformation selection like
-    #    `select(df, :, :a => sin => :a), where :a is produced both by `:` and by `:a => sin => :a`
-    # * make sure that if some column is produced by transformation like `:a => sin => :a`
-    #   and it appears earlier or later in non-transforming selection like `:` or `:a`
-    #   then the transformation is computed and inserted in to the target data frame once and only once
-    #   the first time the target column is requested to be produced.
+    #   `select(df, :, :a => ByRow(sin) => :a), where :a is produced both by `:`
+    #   and by `:a => ByRow(sin) => :a`
+    # * make sure that if some column is produced by transformation like
+    #   `:a => ByRow(sin) => :a` and it appears earlier or later in non-transforming
+    #   selection like `:` or `:a` then the transformation is computed and inserted
+    #   in to the target data frame once and only once the first time the target column
+    #   is requested to be produced.
+    #
     # For example in:
     #
     # julia> df = DataFrame(a=1:2, b=3:4)
@@ -400,7 +401,7 @@ function _select(df::AbstractDataFrame, normalized_cs, copycols::Bool)
     # │ 1   │ 1     │ 3     │
     # │ 2   │ 2     │ 4     │
     #
-    # julia> select(df, :, :a=>ByRow(sin)=>:a, :a, 1)
+    # julia> select(df, :, :a => ByRow(sin) => :a, :a, 1)
     # 2×2 DataFrame
     # │ Row │ a        │ b     │
     # │     │ Float64  │ Int64 │