A Python project that improves basket trading strategies using Bayesian Optimization (BO) to find optimal cointegrating weights, outperforming traditional Johansen test-based approaches.
Traditional cointegration methods (like the Johansen test) generate in-sample cointegrating weights, but these often fail to generalize out-of-sample. This project uses Bayesian Optimization — a global optimization technique — to find parameter configurations that maximize out-of-sample profitability.
- ✅ Johansen Cointegration Test - Baseline weight calculation
- ✅ Bayesian Optimization - Finds weights that maximize Sharpe ratio
- ✅ Walk-Forward Analysis - Rolling window out-of-sample testing
- ✅ Multi-Objective Optimization - Balance multiple criteria (Sharpe + Calmar)
- ✅ Performance Metrics - Sharpe, Sortino, Calmar, Max Drawdown, Profit Factor
- ✅ Visualization - Equity curves, drawdowns, rolling Sharpe, convergence plots
| Metric | Bayesian Optimization | Johansen Baseline | Improvement |
|---|---|---|---|
| Sharpe Ratio | -0.003 | -0.415 | +0.411 |
| Total Return | -11.29% | -43.10% | +31.81% |
| Max Drawdown | -48.38% | -56.83% | +8.45% |
| Profit Factor | 1.027 | 0.893 | +0.134 |
Bayesian Optimization significantly outperforms traditional Johansen method!
- Python 3.8 or higher
- pip package manager
git clone https://github.com/digantk31/Basket-Trading.git
cd Basket-Tradingpython -m venv venvWindows (Command Prompt):
venv\Scripts\activateWindows (PowerShell):
.\venv\Scripts\Activate.ps1Linux/Mac:
source venv/bin/activatepip install -r requirements.txtpython main.pydeactivateBasket Trading/
├── src/ # Core source modules
│ ├── __init__.py # Package initialization
│ ├── data_loader.py # Data fetching from Yahoo Finance
│ ├── cointegration.py # Johansen cointegration test
│ ├── bayesian_optimizer.py # Bayesian Optimization engine
│ ├── trading_strategy.py # Mean reversion trading signals
│ ├── backtester.py # Backtesting & walk-forward analysis
│ ├── metrics.py # Performance metrics calculation
│ └── utils.py # Visualization & utilities
├── notebooks/ # Jupyter notebooks for analysis
│ ├── 01_data_exploration.ipynb
│ ├── 02_cointegration_analysis.ipynb
│ ├── 03_bayesian_optimization.ipynb
│ └── 04_strategy_comparison.ipynb
├── tests/ # Unit tests
│ ├── test_cointegration.py
│ ├── test_optimizer.py
│ └── test_backtester.py
├── config/
│ └── config.yaml # Configuration parameters
├── data/ # Data storage (auto-generated)
│ ├── raw/
│ └── processed/
├── results/ # Output files
│ ├── plots/ # Generated visualizations
│ └── reports/ # Performance reports
├── main.py # Main execution script
├── requirements.txt # Python dependencies
└── README.md # This file
The spread is a linear combination of asset prices using cointegrating weights:
Spread(t) = Σ (wᵢ × Pᵢ(t))
Where:
wᵢ= Weight for asset i (from Johansen test or BO)Pᵢ(t)= Price of asset i at time t
Code Reference: src/cointegration.py → get_spread()
The z-score measures how many standard deviations the spread is from its mean:
Z-Score(t) = (Spread(t) - μ) / σ
Where:
μ= Rolling mean of spread (20-day lookback)σ= Rolling standard deviation of spread
Trading Rules:
- LONG when Z-Score < -2.0 (expect reversion up)
- SHORT when Z-Score > +2.0 (expect reversion down)
- EXIT when |Z-Score| < 0.5 (spread reverted)
Code Reference: src/trading_strategy.py → calculate_zscore()
Measures risk-adjusted return (higher is better):
Sharpe Ratio = (E[R] - Rᶠ) / σᴿ × √252
Where:
E[R]= Mean daily returnRᶠ= Risk-free rate (default: 2% annual)σᴿ= Standard deviation of returns√252= Annualization factor (trading days/year)
Code Reference: src/metrics.py → sharpe_ratio()
Like Sharpe, but only penalizes downside volatility:
Sortino Ratio = (E[R] - Rᶠ) / σᴰ × √252
Where:
σᴰ= Standard deviation of negative returns only
Code Reference: src/metrics.py → sortino_ratio()
Largest peak-to-trough decline in portfolio value:
Max Drawdown = min((Cumulative(t) - Peak(t)) / Peak(t))
Where:
Cumulative(t)= Cumulative return at time tPeak(t)= Running maximum of cumulative returns
Code Reference: src/metrics.py → max_drawdown()
Annual return divided by maximum drawdown:
Calmar Ratio = Annualized Return / |Max Drawdown|
Code Reference: src/metrics.py → calmar_ratio()
Ratio of gross profits to gross losses:
Profit Factor = Σ(Positive Returns) / |Σ(Negative Returns)|
Code Reference: src/metrics.py → profit_factor()
Time for spread to revert halfway to its mean (Ornstein-Uhlenbeck process):
Half-Life = -ln(2) / θ
Where θ is estimated from:
ΔSpread(t) = θ × (Spread(t-1) - μ) + ε
Interpretation:
- Smaller half-life = Faster mean reversion = Better for trading
- Half-life < 30 days: Suitable for trading
- Half-life > 60 days: Too slow for mean reversion strategy
Code Reference: src/cointegration.py → get_half_life()
Daily strategy return calculation:
Return(t) = Position(t-1) × Spread_Return(t) - Transaction_Cost × |Position_Change|
Where:
Position= +1 (long), -1 (short), or 0 (flat)Spread_Return(t)= (Spread(t) - Spread(t-1)) / Spread(t-1)Transaction_Cost= 0.1% per trade
Code Reference: src/trading_strategy.py → calculate_returns()
Weights are normalized to sum to 1 in absolute terms:
wᵢ_normalized = wᵢ / Σ|wⱼ|
This ensures consistent position sizing across different weight configurations.
Code Reference: src/bayesian_optimizer.py → _normalize_weights()
Edit config/config.yaml to customize:
# Asset Configuration
data:
tickers: ['XLF', 'XLK', 'XLE', 'XLV', 'XLI'] # Sector ETFs
start_date: '2018-01-01'
end_date: '2024-01-01'
train_ratio: 0.7
# Trading Strategy
strategy:
entry_threshold: 2.0 # Z-score for entry
exit_threshold: 0.5 # Z-score for exit
stop_loss: 3.0 # Stop loss threshold
lookback_period: 20 # Rolling window for z-score
transaction_cost: 0.001 # 0.1% per trade
# Bayesian Optimization
optimizer:
n_calls: 100 # Total optimization iterations
n_random_starts: 20 # Random initial evaluations
objective: 'sharpe' # Optimization target
random_state: 42 # For reproducibility
# Walk-Forward Analysis
walk_forward:
in_sample_size: 252 # Training window (1 year)
out_sample_size: 63 # Testing window (3 months)┌─────────────────────────────────────────────────────────────┐
│ main.py Execution Flow │
├─────────────────────────────────────────────────────────────┤
│ │
│ [STEP 1] Load Data │
│ └── Fetch prices from Yahoo Finance │
│ └── Data for: XLF, XLK, XLE, XLV, XLI │
│ │
│ [STEP 2] Split Data │
│ └── 70% Training, 30% Testing │
│ │
│ [STEP 3] Johansen Cointegration (Baseline) │
│ └── Calculate cointegrating weights │
│ └── Compute mean reversion half-life │
│ └── Backtest on test data │
│ │
│ [STEP 4] Bayesian Optimization │
│ └── Run 100 optimization iterations │
│ └── Find weights that maximize Sharpe ratio │
│ └── Backtest optimized weights │
│ │
│ [STEP 5] Walk-Forward Analysis │
│ └── 19 rolling windows │
│ └── Compare BO vs Johansen on each window │
│ └── Calculate combined performance │
│ │
│ [STEP 6] Generate Visualizations │
│ └── Equity curves │
│ └── Drawdown charts │
│ └── Rolling Sharpe ratio │
│ └── Optimization convergence │
│ │
│ [STEP 7] Generate Report │
│ └── Save comparison report to results/reports/ │
│ │
└─────────────────────────────────────────────────────────────┘
Assets are cointegrated if a linear combination of their prices is stationary (mean-reverting). The Johansen test identifies these relationships and provides cointegrating weights.
- Entry: When z-score exceeds ±2.0 (spread deviates from mean)
- Exit: When z-score returns to ±0.5 (spread reverts to mean)
- Long spread: When z-score < -2.0 (expect reversion up)
- Short spread: When z-score > +2.0 (expect reversion down)
Instead of using statistically-derived weights (Johansen), BO searches for weights that maximize actual trading performance (Sharpe ratio) by:
- Building a probabilistic model of the objective function
- Using acquisition functions to balance exploration vs exploitation
- Efficiently finding optimal weights with fewer evaluations
# Activate virtual environment first
venv\Scripts\activate
# Run all tests
pytest tests/ -v
# Run specific test file
pytest tests/test_cointegration.py -v
pytest tests/test_optimizer.py -v
pytest tests/test_backtester.py -v# Activate virtual environment
venv\Scripts\activate
# Install Jupyter (if not installed)
pip install jupyter
# Launch Jupyter
jupyter notebook notebooks/Available Notebooks:
01_data_exploration.ipynb- Explore price data and correlations02_cointegration_analysis.ipynb- Johansen test and spread analysis03_bayesian_optimization.ipynb- BO weight optimization04_strategy_comparison.ipynb- Walk-forward comparison
| Package | Purpose |
|---|---|
| numpy | Numerical computing |
| pandas | Data manipulation |
| scipy | Scientific computing |
| statsmodels | Johansen cointegration test |
| scikit-optimize | Bayesian Optimization |
| yfinance | Fetch stock data |
| matplotlib | Plotting |
| seaborn | Statistical visualization |
| plotly | Interactive plots |
| pyyaml | Configuration management |
| pytest | Testing framework |
After running python main.py, the following files are generated:
results/
├── plots/
│ ├── equity_curves.png # Cumulative returns comparison
│ ├── drawdowns.png # Drawdown analysis
│ ├── rolling_sharpe.png # Rolling Sharpe ratio over time
│ └── convergence.png # BO optimization convergence
└── reports/
└── comparison_report.txt # Detailed performance comparison
Edit config/config.yaml:
data:
tickers: ['GLD', 'SLV', 'USO'] # Commodities examplestrategy:
entry_threshold: 1.5 # More frequent trades
exit_threshold: 0.3 # Tighter exitsoptimizer:
n_calls: 200 # More thorough searchMIT License
Built with ❤️ using Python and Bayesian Optimization