This project applies Fuzzy C-Means (FCM) clustering (via scikit-fuzzy) to the Default of Credit Card Clients dataset using two features:
- LIMIT_BAL— credit limit
- BILL TOTAL—- BILL_AMT1+…+BILL_AMT6
The pipeline:
- Load data/credit_card_clients.csv(your exact path).
- Create BILL TOTAL.
- Scale ['LIMIT_BAL', 'BILL TOTAL']to[0,1].
- Run FCM for a sweep of cluster counts (c = 2..10).
- Plot FPC vs cand a grid of mini-scatter plots.
- Pick the cwith the highest FPC and plot the final clustering.
FCM finds 
- 
$u_{ik}\in[0,1]$ , and$\sum_{i=1}^c u_{ik}=1$ for each$k$ (every column of$U$ sums to 1).
- Unlike K-Means’ hard labels, FCM tells you how much each point belongs to each cluster.
Minimize the fuzzy within-cluster SSE with fuzzifier 
Centers
Memberships
Stopping
Stop when maxiter is reached.
We report the Fuzzy Partition Coefficient (FPC) per 
- 
Higher is better. $\mathrm{FPC}\approx 1/c$ means very fuzzy/overlapping partitions.
- In this 2-feature run, FPC peaks at $c=2$ and then decreases as$c$ grows.
(Optionally, another index you may see is Xie–Beni:
How to read:
- The curve shows FPC for c = 2..10.
- Pick the peak (here it’s at c = 2).
- The downward slope after c=2means adding more clusters makes the partition fuzzier (less crisp separation) for these two features.
What you’re seeing:
- Each panel is an FCM run for a specific c.
- Colors = hard labels from the fuzzy memberships (argmaxacross clusters).
- Black/red squares = cluster centers (in scaled space).
- As cincreases, the algorithm keeps subdividing the dense region at low limit / low bill totals.
- FPC shown in each title steadily declines with c, indicating the split becomes less crisp.
Takeaway: For these two features, few clusters (especially c=2) summarize the structure best. Large c just slices the same mass in arbitrary ways.
Interpretation:
- Two broad groups appear in scaled space:
- smaller limits & small bills;
- higher limits & larger bills.
 
- The “X” markers are the fuzzy centers.
- Remember: points near the boundary have non-trivial memberships in both clusters; colors show the hard label for visualization only.
- Always scale features before FCM (Euclidean metric).
- If you switch to more features, avoid heavy collinearity (or use a compact subset).
- If you ever get FPC ≈ 1/cacross allc, that’s a degenerate run (uniform memberships). Rerun with different initializations or adjust features/scale.
- Soft memberships are great for: thresholding borderline points, ranking “how typical” a point is for a cluster, and flagging outliers (low max-membership).


