Skip to content

Commit 01ca3a8

Browse files
author
SebastienMelo
committed
added changes in other notebooks
1 parent bec1a4a commit 01ca3a8

File tree

2 files changed

+134
-0
lines changed

2 files changed

+134
-0
lines changed
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# ✅ Quiz M4.01
2+
3+
```{admonition} Question
4+
Imagine you work for a music streaming platform that hosts a vast library of
5+
songs, playlists, and podcasts. You have access to detailed listening data from
6+
millions of users. For each user, you know their most-listened genres, the
7+
devices they use, their average session length, and how often they explore new
8+
content.
9+
10+
You want to segment users based on their listening patterns to improve
11+
personalized recommendations, without relying on rigid, predefined labels like
12+
"pop fan" or "casual listener" which may fail to capture the complexity of
13+
their behavior.
14+
15+
What kind of problem are you dealing with?
16+
17+
- a) a supervised task
18+
- b) an unsupervised task
19+
- c) a classification task
20+
- d) a clustering task
21+
22+
_Select all answers that apply_
23+
```
24+
25+
+++
26+
27+
```{admonition} Question
28+
The plots below show the cluster labels as found by k-means with 3 clusters, only
29+
differing in the scaling step. Based on this, which conclusions can be obtained?
30+
31+
![K-means on original features](../../figures/evaluation_quiz_kmeans_not_scaled.svg)
32+
![K-means on scaled features](../../figures/evaluation_quiz_kmeans_scaled.svg)
33+
34+
- a) without scaling, cluster assignment is dominated by the feature in the vertical axis
35+
- b) without scaling, cluster assignment is dominated by the feature in the horizontal axis
36+
- c) without scaling, both features contribute equally to cluster assignment
37+
38+
_Select a single answer_
39+
```
40+
41+
+++
42+
43+
```{admonition} Question
44+
Which of the following statements correctly describe factors that affect the
45+
stability of k-means clustering across different resampling iterations of the data?
46+
47+
- a) K-means can produce different results on resampled datasets due to
48+
sensitivity to initialization.
49+
- b) If data is unevenly distributed, the stability improves when increasing the
50+
parameter `n_init` in the "k-means++" initialization.
51+
- c) Stability under resampling is guaranteed after feature scaling.
52+
- d) Increasing the number of clusters always reduces the variability of
53+
results across resamples.
54+
55+
_Select all answers that apply_
56+
```
57+
58+
+++
59+
60+
```{admonition} Question
61+
Which of the following statements correctly describe how WCSS (within-cluster
62+
sum of squares, or inertia) behaves in k-means clustering?
63+
64+
- a) For a fixed number of clusters, WCSS is lower when clusters are compact.
65+
- b) For a fixed number of clusters, WCSS is lower for wider clusters.
66+
- c) For a fixed number of clusters, lower WCSS implies lower computational cost
67+
during training.
68+
- d) Assuming `n_init` is large enough to ensure convergence, WCSS always
69+
decreases as the number of clusters increases.
70+
71+
_Select all answers that apply_
72+
```
73+
74+
+++
75+
76+
```{admonition} Question
77+
Which of the following statements correctly describe differences between
78+
supervised and unsupervised clustering metrics?
79+
80+
- a) Supervised clustering metrics such as ARI and AMI require access to ground
81+
truth labels to evaluate clustering performance.
82+
- b) WCSS and the silhouette score evaluate internal cluster structure without
83+
needing reference labels.
84+
- c) V-measure is zero when labels are assigned completely at random.
85+
- d) Supervised clustering metrics are not useful if the number of clusters does
86+
not match the number of predefined classes.
87+
88+
_Select all answers that apply_
89+
```
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# ✅ Quiz M4.02
2+
3+
```{admonition} Question
4+
If we increase `min_cluster_size` in HDBSCAN, what happens to the number of
5+
points labeled as noise?
6+
7+
- a) It decreases.
8+
- b) It increases.
9+
- c) It stays the same.
10+
- d) HDBSCAN fails to converge.
11+
12+
_Select a single answer_
13+
14+
```
15+
16+
+++
17+
18+
```{admonition} Question
19+
What happens to k-means centroids in the presence of outliers?
20+
21+
- a) They move towards the outliers assigned to their cluster.
22+
- b) They are not sensitive to outliers.
23+
- c) If a centroid is initialized on an outlier, it may remain isolated in
24+
subsequent iterations.
25+
26+
_Select all answers that apply_
27+
28+
```
29+
30+
+++
31+
32+
```{admonition} Question
33+
A `KMeans` instance with `n_clusters=10` is used to transform the latitude and
34+
longitude in a supervised learning pipeline. Provided the original dataset consists of
35+
`n_features`, including those two, how many features are passed to
36+
the final estimator of the pipeline?
37+
38+
- a) `n_features` + 10
39+
- b) `n_features` + 8
40+
- c) `n_features` - 2
41+
- d) `n_features`
42+
43+
_Select a single answer_
44+
45+
```

0 commit comments

Comments
 (0)