🎯 Basket Trading with Bayesian Optimization

A Python project that improves basket trading strategies using Bayesian Optimization (BO) to find optimal cointegrating weights, outperforming traditional Johansen test-based approaches.

📋 Project Overview

Traditional cointegration methods (like the Johansen test) generate in-sample cointegrating weights, but these often fail to generalize out-of-sample. This project uses Bayesian Optimization — a global optimization technique — to find parameter configurations that maximize out-of-sample profitability.

Key Features

✅ Johansen Cointegration Test - Baseline weight calculation
✅ Bayesian Optimization - Finds weights that maximize Sharpe ratio
✅ Walk-Forward Analysis - Rolling window out-of-sample testing
✅ Multi-Objective Optimization - Balance multiple criteria (Sharpe + Calmar)
✅ Performance Metrics - Sharpe, Sortino, Calmar, Max Drawdown, Profit Factor
✅ Visualization - Equity curves, drawdowns, rolling Sharpe, convergence plots

📊 Results Summary

Metric	Bayesian Optimization	Johansen Baseline	Improvement
Sharpe Ratio	-0.003	-0.415	+0.411
Total Return	-11.29%	-43.10%	+31.81%
Max Drawdown	-48.38%	-56.83%	+8.45%
Profit Factor	1.027	0.893	+0.134

Bayesian Optimization significantly outperforms traditional Johansen method!

🚀 Quick Start

Prerequisites

Python 3.8 or higher
pip package manager

Step 1: Clone and Navigate to Project Directory

git clone https://github.com/digantk31/Basket-Trading.git
cd Basket-Trading

Step 2: Create Virtual Environment

python -m venv venv

Step 3: Activate Virtual Environment

Windows (Command Prompt):

venv\Scripts\activate

Windows (PowerShell):

.\venv\Scripts\Activate.ps1

Linux/Mac:

source venv/bin/activate

Step 4: Install Dependencies

pip install -r requirements.txt

Step 5: Run the Project

python main.py

Step 6: Deactivate Virtual Environment (when done)

deactivate

📁 Project Structure

Basket Trading/
├── src/                           # Core source modules
│   ├── __init__.py               # Package initialization
│   ├── data_loader.py            # Data fetching from Yahoo Finance
│   ├── cointegration.py          # Johansen cointegration test
│   ├── bayesian_optimizer.py     # Bayesian Optimization engine
│   ├── trading_strategy.py       # Mean reversion trading signals
│   ├── backtester.py             # Backtesting & walk-forward analysis
│   ├── metrics.py                # Performance metrics calculation
│   └── utils.py                  # Visualization & utilities
├── notebooks/                     # Jupyter notebooks for analysis
│   ├── 01_data_exploration.ipynb
│   ├── 02_cointegration_analysis.ipynb
│   ├── 03_bayesian_optimization.ipynb
│   └── 04_strategy_comparison.ipynb
├── tests/                         # Unit tests
│   ├── test_cointegration.py
│   ├── test_optimizer.py
│   └── test_backtester.py
├── config/
│   └── config.yaml               # Configuration parameters
├── data/                          # Data storage (auto-generated)
│   ├── raw/
│   └── processed/
├── results/                       # Output files
│   ├── plots/                    # Generated visualizations
│   └── reports/                  # Performance reports
├── main.py                        # Main execution script
├── requirements.txt               # Python dependencies
└── README.md                      # This file

📐 Mathematical Formulas

1. Cointegrating Spread

The spread is a linear combination of asset prices using cointegrating weights:

Spread(t) = Σ (wᵢ × Pᵢ(t))

Where:

wᵢ = Weight for asset i (from Johansen test or BO)
Pᵢ(t) = Price of asset i at time t

Code Reference: src/cointegration.py → get_spread()

2. Z-Score (Standardized Spread)

The z-score measures how many standard deviations the spread is from its mean:

Z-Score(t) = (Spread(t) - μ) / σ

Where:

μ = Rolling mean of spread (20-day lookback)
σ = Rolling standard deviation of spread

Trading Rules:

LONG when Z-Score < -2.0 (expect reversion up)
SHORT when Z-Score > +2.0 (expect reversion down)
EXIT when |Z-Score| < 0.5 (spread reverted)

Code Reference: src/trading_strategy.py → calculate_zscore()

3. Sharpe Ratio

Measures risk-adjusted return (higher is better):

Sharpe Ratio = (E[R] - Rᶠ) / σᴿ × √252

Where:

E[R] = Mean daily return
Rᶠ = Risk-free rate (default: 2% annual)
σᴿ = Standard deviation of returns
√252 = Annualization factor (trading days/year)

Code Reference: src/metrics.py → sharpe_ratio()

4. Sortino Ratio

Like Sharpe, but only penalizes downside volatility:

Sortino Ratio = (E[R] - Rᶠ) / σᴰ × √252

Where:

σᴰ = Standard deviation of negative returns only

Code Reference: src/metrics.py → sortino_ratio()

5. Maximum Drawdown

Largest peak-to-trough decline in portfolio value:

Max Drawdown = min((Cumulative(t) - Peak(t)) / Peak(t))

Where:

Cumulative(t) = Cumulative return at time t
Peak(t) = Running maximum of cumulative returns

Code Reference: src/metrics.py → max_drawdown()

6. Calmar Ratio

Annual return divided by maximum drawdown:

Calmar Ratio = Annualized Return / |Max Drawdown|

Code Reference: src/metrics.py → calmar_ratio()

7. Profit Factor

Ratio of gross profits to gross losses:

Profit Factor = Σ(Positive Returns) / |Σ(Negative Returns)|

Code Reference: src/metrics.py → profit_factor()

8. Mean Reversion Half-Life

Time for spread to revert halfway to its mean (Ornstein-Uhlenbeck process):

Half-Life = -ln(2) / θ

Where θ is estimated from:

ΔSpread(t) = θ × (Spread(t-1) - μ) + ε

Interpretation:

Smaller half-life = Faster mean reversion = Better for trading
Half-life < 30 days: Suitable for trading
Half-life > 60 days: Too slow for mean reversion strategy

Code Reference: src/cointegration.py → get_half_life()

9. Strategy Returns

Daily strategy return calculation:

Return(t) = Position(t-1) × Spread_Return(t) - Transaction_Cost × |Position_Change|

Where:

Position = +1 (long), -1 (short), or 0 (flat)
Spread_Return(t) = (Spread(t) - Spread(t-1)) / Spread(t-1)
Transaction_Cost = 0.1% per trade

Code Reference: src/trading_strategy.py → calculate_returns()

10. Normalized Weights

Weights are normalized to sum to 1 in absolute terms:

wᵢ_normalized = wᵢ / Σ|wⱼ|

This ensures consistent position sizing across different weight configurations.

Code Reference: src/bayesian_optimizer.py → _normalize_weights()

⚙️ Configuration

Edit config/config.yaml to customize:

# Asset Configuration
data:
  tickers: ['XLF', 'XLK', 'XLE', 'XLV', 'XLI']  # Sector ETFs
  start_date: '2018-01-01'
  end_date: '2024-01-01'
  train_ratio: 0.7

# Trading Strategy
strategy:
  entry_threshold: 2.0      # Z-score for entry
  exit_threshold: 0.5       # Z-score for exit
  stop_loss: 3.0            # Stop loss threshold
  lookback_period: 20       # Rolling window for z-score
  transaction_cost: 0.001   # 0.1% per trade

# Bayesian Optimization
optimizer:
  n_calls: 100              # Total optimization iterations
  n_random_starts: 20       # Random initial evaluations
  objective: 'sharpe'       # Optimization target
  random_state: 42          # For reproducibility

# Walk-Forward Analysis
walk_forward:
  in_sample_size: 252       # Training window (1 year)
  out_sample_size: 63       # Testing window (3 months)

🔄 Execution Flow

┌─────────────────────────────────────────────────────────────┐
│                    main.py Execution Flow                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  [STEP 1] Load Data                                         │
│     └── Fetch prices from Yahoo Finance                     │
│     └── Data for: XLF, XLK, XLE, XLV, XLI                   │
│                                                             │
│  [STEP 2] Split Data                                        │
│     └── 70% Training, 30% Testing                           │
│                                                             │
│  [STEP 3] Johansen Cointegration (Baseline)                 │
│     └── Calculate cointegrating weights                     │
│     └── Compute mean reversion half-life                    │
│     └── Backtest on test data                               │
│                                                             │
│  [STEP 4] Bayesian Optimization                             │
│     └── Run 100 optimization iterations                     │
│     └── Find weights that maximize Sharpe ratio             │
│     └── Backtest optimized weights                          │
│                                                             │
│  [STEP 5] Walk-Forward Analysis                             │
│     └── 19 rolling windows                                  │
│     └── Compare BO vs Johansen on each window               │
│     └── Calculate combined performance                      │
│                                                             │
│  [STEP 6] Generate Visualizations                           │
│     └── Equity curves                                       │
│     └── Drawdown charts                                     │
│     └── Rolling Sharpe ratio                                │
│     └── Optimization convergence                            │
│                                                             │
│  [STEP 7] Generate Report                                   │
│     └── Save comparison report to results/reports/          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

📈 Core Concepts

Cointegration

Assets are cointegrated if a linear combination of their prices is stationary (mean-reverting). The Johansen test identifies these relationships and provides cointegrating weights.

Mean Reversion Strategy

Entry: When z-score exceeds ±2.0 (spread deviates from mean)
Exit: When z-score returns to ±0.5 (spread reverts to mean)
Long spread: When z-score < -2.0 (expect reversion up)
Short spread: When z-score > +2.0 (expect reversion down)

Bayesian Optimization

Instead of using statistically-derived weights (Johansen), BO searches for weights that maximize actual trading performance (Sharpe ratio) by:

Building a probabilistic model of the objective function
Using acquisition functions to balance exploration vs exploitation
Efficiently finding optimal weights with fewer evaluations

🧪 Running Tests

# Activate virtual environment first
venv\Scripts\activate

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_cointegration.py -v
pytest tests/test_optimizer.py -v
pytest tests/test_backtester.py -v

📓 Jupyter Notebooks

# Activate virtual environment
venv\Scripts\activate

# Install Jupyter (if not installed)
pip install jupyter

# Launch Jupyter
jupyter notebook notebooks/

Available Notebooks:

01_data_exploration.ipynb - Explore price data and correlations
02_cointegration_analysis.ipynb - Johansen test and spread analysis
03_bayesian_optimization.ipynb - BO weight optimization
04_strategy_comparison.ipynb - Walk-forward comparison

📦 Dependencies

Package	Purpose
numpy	Numerical computing
pandas	Data manipulation
scipy	Scientific computing
statsmodels	Johansen cointegration test
scikit-optimize	Bayesian Optimization
yfinance	Fetch stock data
matplotlib	Plotting
seaborn	Statistical visualization
plotly	Interactive plots
pyyaml	Configuration management
pytest	Testing framework

📊 Output Files

After running python main.py, the following files are generated:

results/
├── plots/
│   ├── equity_curves.png      # Cumulative returns comparison
│   ├── drawdowns.png          # Drawdown analysis
│   ├── rolling_sharpe.png     # Rolling Sharpe ratio over time
│   └── convergence.png        # BO optimization convergence
└── reports/
    └── comparison_report.txt  # Detailed performance comparison

🔧 Customization

Change Assets

Edit config/config.yaml:

data:
  tickers: ['GLD', 'SLV', 'USO']  # Commodities example

Adjust Trading Parameters

strategy:
  entry_threshold: 1.5   # More frequent trades
  exit_threshold: 0.3    # Tighter exits

Increase Optimization Iterations

optimizer:
  n_calls: 200           # More thorough search

📄 License

MIT License

👨‍💻 Author

Built with ❤️ using Python and Bayesian Optimization

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
notebooks		notebooks
results		results
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SETUP.md		SETUP.md
gitcommand.txt		gitcommand.txt
main.py		main.py
requirements.txt		requirements.txt

License

digantk31/Basket-Trading

Folders and files

Latest commit

History

Repository files navigation

🎯 Basket Trading with Bayesian Optimization

📋 Project Overview

Key Features

📊 Results Summary

🚀 Quick Start

Prerequisites

Step 1: Clone and Navigate to Project Directory

Step 2: Create Virtual Environment

Step 3: Activate Virtual Environment

Step 4: Install Dependencies

Step 5: Run the Project

Step 6: Deactivate Virtual Environment (when done)

📁 Project Structure

📐 Mathematical Formulas

1. Cointegrating Spread

2. Z-Score (Standardized Spread)

3. Sharpe Ratio

4. Sortino Ratio

5. Maximum Drawdown

6. Calmar Ratio

7. Profit Factor

8. Mean Reversion Half-Life

9. Strategy Returns

10. Normalized Weights

⚙️ Configuration

🔄 Execution Flow

📈 Core Concepts

Cointegration

Mean Reversion Strategy

Bayesian Optimization

🧪 Running Tests

📓 Jupyter Notebooks

📦 Dependencies

📊 Output Files

🔧 Customization

Change Assets

Adjust Trading Parameters

Increase Optimization Iterations

📄 License

👨‍💻 Author

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages