Skip to content

An advanced mean-reversion trading strategy for ETF baskets using Bayesian Optimization to maximize Sharpe Ratio. Features walk-forward analysis, cointegration testing, and comprehensive backtesting reports.

License

Notifications You must be signed in to change notification settings

digantk31/Basket-Trading

Repository files navigation

🎯 Basket Trading with Bayesian Optimization

A Python project that improves basket trading strategies using Bayesian Optimization (BO) to find optimal cointegrating weights, outperforming traditional Johansen test-based approaches.


📋 Project Overview

Traditional cointegration methods (like the Johansen test) generate in-sample cointegrating weights, but these often fail to generalize out-of-sample. This project uses Bayesian Optimization — a global optimization technique — to find parameter configurations that maximize out-of-sample profitability.

Key Features

  • Johansen Cointegration Test - Baseline weight calculation
  • Bayesian Optimization - Finds weights that maximize Sharpe ratio
  • Walk-Forward Analysis - Rolling window out-of-sample testing
  • Multi-Objective Optimization - Balance multiple criteria (Sharpe + Calmar)
  • Performance Metrics - Sharpe, Sortino, Calmar, Max Drawdown, Profit Factor
  • Visualization - Equity curves, drawdowns, rolling Sharpe, convergence plots

📊 Results Summary

Metric Bayesian Optimization Johansen Baseline Improvement
Sharpe Ratio -0.003 -0.415 +0.411
Total Return -11.29% -43.10% +31.81%
Max Drawdown -48.38% -56.83% +8.45%
Profit Factor 1.027 0.893 +0.134

Bayesian Optimization significantly outperforms traditional Johansen method!


🚀 Quick Start

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Step 1: Clone and Navigate to Project Directory

git clone https://github.com/digantk31/Basket-Trading.git
cd Basket-Trading

Step 2: Create Virtual Environment

python -m venv venv

Step 3: Activate Virtual Environment

Windows (Command Prompt):

venv\Scripts\activate

Windows (PowerShell):

.\venv\Scripts\Activate.ps1

Linux/Mac:

source venv/bin/activate

Step 4: Install Dependencies

pip install -r requirements.txt

Step 5: Run the Project

python main.py

Step 6: Deactivate Virtual Environment (when done)

deactivate

📁 Project Structure

Basket Trading/
├── src/                           # Core source modules
│   ├── __init__.py               # Package initialization
│   ├── data_loader.py            # Data fetching from Yahoo Finance
│   ├── cointegration.py          # Johansen cointegration test
│   ├── bayesian_optimizer.py     # Bayesian Optimization engine
│   ├── trading_strategy.py       # Mean reversion trading signals
│   ├── backtester.py             # Backtesting & walk-forward analysis
│   ├── metrics.py                # Performance metrics calculation
│   └── utils.py                  # Visualization & utilities
├── notebooks/                     # Jupyter notebooks for analysis
│   ├── 01_data_exploration.ipynb
│   ├── 02_cointegration_analysis.ipynb
│   ├── 03_bayesian_optimization.ipynb
│   └── 04_strategy_comparison.ipynb
├── tests/                         # Unit tests
│   ├── test_cointegration.py
│   ├── test_optimizer.py
│   └── test_backtester.py
├── config/
│   └── config.yaml               # Configuration parameters
├── data/                          # Data storage (auto-generated)
│   ├── raw/
│   └── processed/
├── results/                       # Output files
│   ├── plots/                    # Generated visualizations
│   └── reports/                  # Performance reports
├── main.py                        # Main execution script
├── requirements.txt               # Python dependencies
└── README.md                      # This file

📐 Mathematical Formulas

1. Cointegrating Spread

The spread is a linear combination of asset prices using cointegrating weights:

Spread(t) = Σ (wᵢ × Pᵢ(t))

Where:

  • wᵢ = Weight for asset i (from Johansen test or BO)
  • Pᵢ(t) = Price of asset i at time t

Code Reference: src/cointegration.pyget_spread()


2. Z-Score (Standardized Spread)

The z-score measures how many standard deviations the spread is from its mean:

Z-Score(t) = (Spread(t) - μ) / σ

Where:

  • μ = Rolling mean of spread (20-day lookback)
  • σ = Rolling standard deviation of spread

Trading Rules:

  • LONG when Z-Score < -2.0 (expect reversion up)
  • SHORT when Z-Score > +2.0 (expect reversion down)
  • EXIT when |Z-Score| < 0.5 (spread reverted)

Code Reference: src/trading_strategy.pycalculate_zscore()


3. Sharpe Ratio

Measures risk-adjusted return (higher is better):

Sharpe Ratio = (E[R] - Rᶠ) / σᴿ × √252

Where:

  • E[R] = Mean daily return
  • Rᶠ = Risk-free rate (default: 2% annual)
  • σᴿ = Standard deviation of returns
  • √252 = Annualization factor (trading days/year)

Code Reference: src/metrics.pysharpe_ratio()


4. Sortino Ratio

Like Sharpe, but only penalizes downside volatility:

Sortino Ratio = (E[R] - Rᶠ) / σᴰ × √252

Where:

  • σᴰ = Standard deviation of negative returns only

Code Reference: src/metrics.pysortino_ratio()


5. Maximum Drawdown

Largest peak-to-trough decline in portfolio value:

Max Drawdown = min((Cumulative(t) - Peak(t)) / Peak(t))

Where:

  • Cumulative(t) = Cumulative return at time t
  • Peak(t) = Running maximum of cumulative returns

Code Reference: src/metrics.pymax_drawdown()


6. Calmar Ratio

Annual return divided by maximum drawdown:

Calmar Ratio = Annualized Return / |Max Drawdown|

Code Reference: src/metrics.pycalmar_ratio()


7. Profit Factor

Ratio of gross profits to gross losses:

Profit Factor = Σ(Positive Returns) / |Σ(Negative Returns)|

Code Reference: src/metrics.pyprofit_factor()


8. Mean Reversion Half-Life

Time for spread to revert halfway to its mean (Ornstein-Uhlenbeck process):

Half-Life = -ln(2) / θ

Where θ is estimated from:

ΔSpread(t) = θ × (Spread(t-1) - μ) + ε

Interpretation:

  • Smaller half-life = Faster mean reversion = Better for trading
  • Half-life < 30 days: Suitable for trading
  • Half-life > 60 days: Too slow for mean reversion strategy

Code Reference: src/cointegration.pyget_half_life()


9. Strategy Returns

Daily strategy return calculation:

Return(t) = Position(t-1) × Spread_Return(t) - Transaction_Cost × |Position_Change|

Where:

  • Position = +1 (long), -1 (short), or 0 (flat)
  • Spread_Return(t) = (Spread(t) - Spread(t-1)) / Spread(t-1)
  • Transaction_Cost = 0.1% per trade

Code Reference: src/trading_strategy.pycalculate_returns()


10. Normalized Weights

Weights are normalized to sum to 1 in absolute terms:

wᵢ_normalized = wᵢ / Σ|wⱼ|

This ensures consistent position sizing across different weight configurations.

Code Reference: src/bayesian_optimizer.py_normalize_weights()


⚙️ Configuration

Edit config/config.yaml to customize:

# Asset Configuration
data:
  tickers: ['XLF', 'XLK', 'XLE', 'XLV', 'XLI']  # Sector ETFs
  start_date: '2018-01-01'
  end_date: '2024-01-01'
  train_ratio: 0.7

# Trading Strategy
strategy:
  entry_threshold: 2.0      # Z-score for entry
  exit_threshold: 0.5       # Z-score for exit
  stop_loss: 3.0            # Stop loss threshold
  lookback_period: 20       # Rolling window for z-score
  transaction_cost: 0.001   # 0.1% per trade

# Bayesian Optimization
optimizer:
  n_calls: 100              # Total optimization iterations
  n_random_starts: 20       # Random initial evaluations
  objective: 'sharpe'       # Optimization target
  random_state: 42          # For reproducibility

# Walk-Forward Analysis
walk_forward:
  in_sample_size: 252       # Training window (1 year)
  out_sample_size: 63       # Testing window (3 months)

🔄 Execution Flow

┌─────────────────────────────────────────────────────────────┐
│                    main.py Execution Flow                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  [STEP 1] Load Data                                         │
│     └── Fetch prices from Yahoo Finance                     │
│     └── Data for: XLF, XLK, XLE, XLV, XLI                   │
│                                                             │
│  [STEP 2] Split Data                                        │
│     └── 70% Training, 30% Testing                           │
│                                                             │
│  [STEP 3] Johansen Cointegration (Baseline)                 │
│     └── Calculate cointegrating weights                     │
│     └── Compute mean reversion half-life                    │
│     └── Backtest on test data                               │
│                                                             │
│  [STEP 4] Bayesian Optimization                             │
│     └── Run 100 optimization iterations                     │
│     └── Find weights that maximize Sharpe ratio             │
│     └── Backtest optimized weights                          │
│                                                             │
│  [STEP 5] Walk-Forward Analysis                             │
│     └── 19 rolling windows                                  │
│     └── Compare BO vs Johansen on each window               │
│     └── Calculate combined performance                      │
│                                                             │
│  [STEP 6] Generate Visualizations                           │
│     └── Equity curves                                       │
│     └── Drawdown charts                                     │
│     └── Rolling Sharpe ratio                                │
│     └── Optimization convergence                            │
│                                                             │
│  [STEP 7] Generate Report                                   │
│     └── Save comparison report to results/reports/          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

📈 Core Concepts

Cointegration

Assets are cointegrated if a linear combination of their prices is stationary (mean-reverting). The Johansen test identifies these relationships and provides cointegrating weights.

Mean Reversion Strategy

  • Entry: When z-score exceeds ±2.0 (spread deviates from mean)
  • Exit: When z-score returns to ±0.5 (spread reverts to mean)
  • Long spread: When z-score < -2.0 (expect reversion up)
  • Short spread: When z-score > +2.0 (expect reversion down)

Bayesian Optimization

Instead of using statistically-derived weights (Johansen), BO searches for weights that maximize actual trading performance (Sharpe ratio) by:

  1. Building a probabilistic model of the objective function
  2. Using acquisition functions to balance exploration vs exploitation
  3. Efficiently finding optimal weights with fewer evaluations

🧪 Running Tests

# Activate virtual environment first
venv\Scripts\activate

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_cointegration.py -v
pytest tests/test_optimizer.py -v
pytest tests/test_backtester.py -v

📓 Jupyter Notebooks

# Activate virtual environment
venv\Scripts\activate

# Install Jupyter (if not installed)
pip install jupyter

# Launch Jupyter
jupyter notebook notebooks/

Available Notebooks:

  1. 01_data_exploration.ipynb - Explore price data and correlations
  2. 02_cointegration_analysis.ipynb - Johansen test and spread analysis
  3. 03_bayesian_optimization.ipynb - BO weight optimization
  4. 04_strategy_comparison.ipynb - Walk-forward comparison

📦 Dependencies

Package Purpose
numpy Numerical computing
pandas Data manipulation
scipy Scientific computing
statsmodels Johansen cointegration test
scikit-optimize Bayesian Optimization
yfinance Fetch stock data
matplotlib Plotting
seaborn Statistical visualization
plotly Interactive plots
pyyaml Configuration management
pytest Testing framework

📊 Output Files

After running python main.py, the following files are generated:

results/
├── plots/
│   ├── equity_curves.png      # Cumulative returns comparison
│   ├── drawdowns.png          # Drawdown analysis
│   ├── rolling_sharpe.png     # Rolling Sharpe ratio over time
│   └── convergence.png        # BO optimization convergence
└── reports/
    └── comparison_report.txt  # Detailed performance comparison

🔧 Customization

Change Assets

Edit config/config.yaml:

data:
  tickers: ['GLD', 'SLV', 'USO']  # Commodities example

Adjust Trading Parameters

strategy:
  entry_threshold: 1.5   # More frequent trades
  exit_threshold: 0.3    # Tighter exits

Increase Optimization Iterations

optimizer:
  n_calls: 200           # More thorough search

📄 License

MIT License


👨‍💻 Author

Built with ❤️ using Python and Bayesian Optimization

About

An advanced mean-reversion trading strategy for ETF baskets using Bayesian Optimization to maximize Sharpe Ratio. Features walk-forward analysis, cointegration testing, and comprehensive backtesting reports.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published