Skip to content

jc-datarchitect/nordik_customer_segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nordik Seguros Banner

NORDIK Seguros: Customer Segment Intelligence Framework

"Decoding multi-regional customer behavioral archetypes through advanced unsupervised learning"


Machine Learning Data Science Pipeline InsurTech Analytics GitHub stars


nordik_customer_segmentation

An enterprise-grade Customer Segment Intelligence (CSI) framework engineered to discover and profile latent customer behavioral archetypes for optimized housing product cross-selling strategies.

This repository deploys a production-ready, 11-stage analytical pipeline that bridges the gap between high-dimensional transactional data and C-level commercial execution—utilizing advanced outlier isolation, PCA variance optimization, multi-metric cluster validation ($K$-Means), and generalization testing on unseen multi-regional datasets.

Project Architecture & Insights

  • The Analytical Pipeline: Engineers a robust 11-step computational workflow, transitioning from raw high-dimensional transactional inputs to a standardized, normalized dimensional space for model training.
  • Advanced Governance: Implements an automated DataPipeline class to ensure strict adherence to transformation logic (Log-scaling, StandardScaler, and feature engineering), eliminating training-serving skew when generalizing across unseen regional datasets.
  • Unsupervised Intelligence: Employs $K$-Means clustering to identify five distinct behavioral archetypes, validated through a rigorous Silhouette & Elbow methodology to maximize intra-cluster cohesion and inter-cluster separation.
  • Commercial Application: Bridges technical analytics with business strategy by mapping cluster-specific conversion rates, enabling data-driven identification of high-propensity segments for optimized housing insurance cross-selling.

Key Visualizations

  • Clustering Diagnostics: Elbow & Silhouette validation for optimal k-selection.
  • Behavioral Heatmap: Correlation matrix highlighting multi-regional feature relationships.

Technical Stack

  • Language: Python 3.x
  • Core Libraries: pandas, numpy, scikit-learn
  • Visualization: seaborn, matplotlib
  • Architecture: Modularized pipeline via custom DataPipeline class for production-ready inference.

Project Structure

The project follows a modular structure designed for maintainability and scalability in data science workflows:

nordik_customer_segmentation/
├── LICENSE                         # License information
├── README.md                       # Project documentation
├── requirements.txt                # Essential Python dependencies
├── data/                           # Raw datasets and intermediate processed files
├── notebooks/                      # Exploratory data analysis and model experimentation
└── src/                            # Core modular processing logic and ML pipeline
    ├── __init__.py                 # Package initialization
    └── nordik_seguros_pipeline.py  # DataPipeline class (cleaning, feature engineering, and model inference)

Data Privacy & Confidentiality Notice

To adhere to strict industry compliance standards and protect corporate confidentiality, the datasets and metadata within this repository have been subjected to a rigorous anonymization process:

  • Customer Identity Masking: Personally Identifiable Information (PII), such as specific client names, contact details, and sensitive demographics, has been replaced with randomized synthetic placeholders to ensure full compliance with global data protection standards.
  • Geospatial & Account Protection: Original regional identifiers and specific insurance account numbers have been abstracted to prevent the inference of individual policyholder behavior or proprietary market distribution strategies.
  • Product & Revenue Normalization: Transactional values and product-specific labels have been normalized to protect the firm's competitive commercial intelligence and internal financial reporting structures.

The operational pipeline logic, feature engineering, and unsupervised clustering methodology remain 100% faithful to the original business intelligence requirements, ensuring full analytical reproducibility and pedagogical integrity while safeguarding the privacy of the original stakeholders.

About

An enterprise-grade unsupervised ML customer segmentation framework designed for NORDIK Seguros. Deploys a 11-stage pipeline integrating custom outlier isolation, categorical binning, PCA variance optimization, and K-Means clustering to translate high-dimensional transactional data into C-level commercial strategies.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors