Skip to content

Commit 97f4b2b

Browse files
ogriselSebastienMeloArturoAmorQ
committed
[ci skip] MAINT Changed the use of ColumnTransformer to make_column_transformer (#831)
* changed besides to additionally for better phrasing * Apply suggestions from code review * Changed the use of ColumnTransformer to make_column_transformer * fixed format * fixed format * changed additional mentions of ColumnTransformer * Rerender notebooks --------- Co-authored-by: SebastienMelo <seastien.melo@polytechnique.edu> Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> 6625d0c
1 parent 530362c commit 97f4b2b

20 files changed

+316
-351
lines changed

_sources/appendix/glossary.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -368,7 +368,7 @@ The dataset used to train the [model](#model).
368368

369369
An [estimator](#estimator) (i.e. an object that has a `fit` method) supporting
370370
`transform` and/or `fit_transform`. Examples for transformers are
371-
`StandardScaler` or `ColumnTransformer`.
371+
`StandardScaler` or `OneHotEncoder`.
372372

373373
### underfitting
374374

_sources/python_scripts/03_categorical_pipeline_column_transformer.py

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -74,9 +74,10 @@
7474
# categories.
7575
# * **numerical scaling** numerical features which will be standardized.
7676
#
77-
# Now, we create our `ColumnTransfomer` by specifying three values: the
78-
# preprocessor name, the transformer, and the columns. First, let's create the
79-
# preprocessors for the numerical and categorical parts.
77+
# Now, we create our `ColumnTransfomer` using the helper function
78+
# `make_column_transformer`. We specify two values: the transformer, and the
79+
# columns. First, let's create the preprocessors for the numerical and
80+
# categorical parts.
8081

8182
# %%
8283
from sklearn.preprocessing import OneHotEncoder, StandardScaler
@@ -89,13 +90,11 @@
8990
# their respective columns.
9091

9192
# %%
92-
from sklearn.compose import ColumnTransformer
93+
from sklearn.compose import make_column_transformer
9394

94-
preprocessor = ColumnTransformer(
95-
[
96-
("one-hot-encoder", categorical_preprocessor, categorical_columns),
97-
("standard_scaler", numerical_preprocessor, numerical_columns),
98-
]
95+
preprocessor = make_column_transformer(
96+
(categorical_preprocessor, categorical_columns),
97+
(numerical_preprocessor, numerical_columns),
9998
)
10099

101100
# %% [markdown]
@@ -234,8 +233,8 @@
234233
handle_unknown="use_encoded_value", unknown_value=-1
235234
)
236235

237-
preprocessor = ColumnTransformer(
238-
[("categorical", categorical_preprocessor, categorical_columns)],
236+
preprocessor = make_column_transformer(
237+
(categorical_preprocessor, categorical_columns),
239238
remainder="passthrough",
240239
)
241240

_sources/python_scripts/03_categorical_pipeline_ex_02.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,18 +58,19 @@
5858

5959
from sklearn.model_selection import cross_validate
6060
from sklearn.pipeline import make_pipeline
61-
from sklearn.compose import ColumnTransformer
61+
from sklearn.compose import make_column_transformer
6262
from sklearn.preprocessing import OrdinalEncoder
6363
from sklearn.ensemble import HistGradientBoostingClassifier
6464

6565
categorical_preprocessor = OrdinalEncoder(
6666
handle_unknown="use_encoded_value", unknown_value=-1
6767
)
68-
preprocessor = ColumnTransformer(
69-
[("categorical", categorical_preprocessor, categorical_columns)],
68+
preprocessor = make_column_transformer(
69+
(categorical_preprocessor, categorical_columns),
7070
remainder="passthrough",
7171
)
7272

73+
7374
model = make_pipeline(preprocessor, HistGradientBoostingClassifier())
7475

7576
start = time.time()

_sources/python_scripts/03_categorical_pipeline_sol_02.py

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -52,18 +52,19 @@
5252

5353
from sklearn.model_selection import cross_validate
5454
from sklearn.pipeline import make_pipeline
55-
from sklearn.compose import ColumnTransformer
55+
from sklearn.compose import make_column_transformer
5656
from sklearn.preprocessing import OrdinalEncoder
5757
from sklearn.ensemble import HistGradientBoostingClassifier
5858

5959
categorical_preprocessor = OrdinalEncoder(
6060
handle_unknown="use_encoded_value", unknown_value=-1
6161
)
62-
preprocessor = ColumnTransformer(
63-
[("categorical", categorical_preprocessor, categorical_columns)],
62+
preprocessor = make_column_transformer(
63+
(categorical_preprocessor, categorical_columns),
6464
remainder="passthrough",
6565
)
6666

67+
6768
model = make_pipeline(preprocessor, HistGradientBoostingClassifier())
6869

6970
start = time.time()
@@ -90,17 +91,12 @@
9091

9192
from sklearn.preprocessing import StandardScaler
9293

93-
preprocessor = ColumnTransformer(
94-
[
95-
("numerical", StandardScaler(), numerical_columns),
96-
(
97-
"categorical",
98-
OrdinalEncoder(
99-
handle_unknown="use_encoded_value", unknown_value=-1
100-
),
101-
categorical_columns,
102-
),
103-
]
94+
preprocessor = make_column_transformer(
95+
(StandardScaler(), numerical_columns),
96+
(
97+
OrdinalEncoder(handle_unknown="use_encoded_value", unknown_value=-1),
98+
categorical_columns,
99+
),
104100
)
105101

106102
model = make_pipeline(preprocessor, HistGradientBoostingClassifier())
@@ -151,8 +147,8 @@
151147
categorical_preprocessor = OneHotEncoder(
152148
handle_unknown="ignore", sparse_output=False
153149
)
154-
preprocessor = ColumnTransformer(
155-
[("one-hot-encoder", categorical_preprocessor, categorical_columns)],
150+
preprocessor = make_column_transformer(
151+
(categorical_preprocessor, categorical_columns),
156152
remainder="passthrough",
157153
)
158154

_sources/python_scripts/parameter_tuning_ex_02.py

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -37,21 +37,15 @@
3737
)
3838

3939
# %%
40-
from sklearn.compose import ColumnTransformer
40+
from sklearn.compose import make_column_transformer
4141
from sklearn.compose import make_column_selector as selector
4242
from sklearn.preprocessing import OrdinalEncoder
4343

4444
categorical_preprocessor = OrdinalEncoder(
4545
handle_unknown="use_encoded_value", unknown_value=-1
4646
)
47-
preprocessor = ColumnTransformer(
48-
[
49-
(
50-
"cat_preprocessor",
51-
categorical_preprocessor,
52-
selector(dtype_include=object),
53-
)
54-
],
47+
preprocessor = make_column_transformer(
48+
(categorical_preprocessor, selector(dtype_include=object)),
5549
remainder="passthrough",
5650
)
5751

@@ -88,3 +82,5 @@
8882

8983
# %%
9084
# Write your code here.
85+
86+
# %%

_sources/python_scripts/parameter_tuning_grid_search.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -80,14 +80,14 @@
8080
)
8181

8282
# %% [markdown]
83-
# We then use a `ColumnTransformer` to select the categorical columns and apply
83+
# We then use `make_column_transformer` to select the categorical columns and apply
8484
# the `OrdinalEncoder` to them.
8585

8686
# %%
87-
from sklearn.compose import ColumnTransformer
87+
from sklearn.compose import make_column_transformer
8888

89-
preprocessor = ColumnTransformer(
90-
[("cat_preprocessor", categorical_preprocessor, categorical_columns)],
89+
preprocessor = make_column_transformer(
90+
(categorical_preprocessor, categorical_columns),
9191
remainder="passthrough",
9292
# Silence a deprecation warning in scikit-learn v1.6 related to how the
9393
# ColumnTransformer stores an attribute that we do not use in this notebook

_sources/python_scripts/parameter_tuning_nested.py

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@
4141
# pipeline is identical to the one we used in the previous notebook.
4242

4343
# %%
44-
from sklearn.compose import ColumnTransformer
44+
from sklearn.compose import make_column_transformer
4545
from sklearn.preprocessing import OrdinalEncoder
4646
from sklearn.compose import make_column_selector as selector
4747

@@ -51,10 +51,8 @@
5151
categorical_preprocessor = OrdinalEncoder(
5252
handle_unknown="use_encoded_value", unknown_value=-1
5353
)
54-
preprocessor = ColumnTransformer(
55-
[
56-
("cat_preprocessor", categorical_preprocessor, categorical_columns),
57-
],
54+
preprocessor = make_column_transformer(
55+
(categorical_preprocessor, categorical_columns),
5856
remainder="passthrough",
5957
force_int_remainder_cols=False, # Silence a warning in scikit-learn v1.6.
6058
)

_sources/python_scripts/parameter_tuning_randomized_search.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@
6060
# We create the same predictive pipeline as done for the grid-search section.
6161

6262
# %%
63-
from sklearn.compose import ColumnTransformer
63+
from sklearn.compose import make_column_transformer
6464
from sklearn.preprocessing import OrdinalEncoder
6565
from sklearn.compose import make_column_selector as selector
6666

@@ -70,8 +70,8 @@
7070
categorical_preprocessor = OrdinalEncoder(
7171
handle_unknown="use_encoded_value", unknown_value=-1
7272
)
73-
preprocessor = ColumnTransformer(
74-
[("cat_preprocessor", categorical_preprocessor, categorical_columns)],
73+
preprocessor = make_column_transformer(
74+
(categorical_preprocessor, categorical_columns),
7575
remainder="passthrough",
7676
force_int_remainder_cols=False, # Silence a warning in scikit-learn v1.6.
7777
)

_sources/python_scripts/parameter_tuning_sol_02.py

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -31,21 +31,15 @@
3131
)
3232

3333
# %%
34-
from sklearn.compose import ColumnTransformer
34+
from sklearn.compose import make_column_transformer
3535
from sklearn.compose import make_column_selector as selector
3636
from sklearn.preprocessing import OrdinalEncoder
3737

3838
categorical_preprocessor = OrdinalEncoder(
3939
handle_unknown="use_encoded_value", unknown_value=-1
4040
)
41-
preprocessor = ColumnTransformer(
42-
[
43-
(
44-
"cat_preprocessor",
45-
categorical_preprocessor,
46-
selector(dtype_include=object),
47-
)
48-
],
41+
preprocessor = make_column_transformer(
42+
(categorical_preprocessor, selector(dtype_include=object)),
4943
remainder="passthrough",
5044
)
5145

@@ -121,3 +115,5 @@
121115
test_score = model.score(data_test, target_test)
122116

123117
print(f"Test score after the parameter tuning: {test_score:.3f}")
118+
119+
# %%

appendix/glossary.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1040,7 +1040,7 @@ <h3>train set<a class="headerlink" href="#train-set" title="Link to this heading
10401040
<h3>transformer<a class="headerlink" href="#transformer" title="Link to this heading">#</a></h3>
10411041
<p>An <a class="reference internal" href="#estimator"><span class="xref myst">estimator</span></a> (i.e. an object that has a <code class="docutils literal notranslate"><span class="pre">fit</span></code> method) supporting
10421042
<code class="docutils literal notranslate"><span class="pre">transform</span></code> and/or <code class="docutils literal notranslate"><span class="pre">fit_transform</span></code>. Examples for transformers are
1043-
<code class="docutils literal notranslate"><span class="pre">StandardScaler</span></code> or <code class="docutils literal notranslate"><span class="pre">ColumnTransformer</span></code>.</p>
1043+
<code class="docutils literal notranslate"><span class="pre">StandardScaler</span></code> or <code class="docutils literal notranslate"><span class="pre">OneHotEncoder</span></code>.</p>
10441044
</section>
10451045
<section id="underfitting">
10461046
<h3>underfitting<a class="headerlink" href="#underfitting" title="Link to this heading">#</a></h3>

0 commit comments

Comments
 (0)