Skip to content

Yoontae6719/Signature-Informed-Transformer-For-Asset-Allocation

Repository files navigation

Signature‑Informed Transformer (SIT) for Asset Allocation

This is the origin Pytorch implementation of SIT in the following paper: SIGNATURE-INFORMED TRANSFORMER FOR ASSET ALLOCATION.

🚩News(AUG 08, 2025) We have released SIT.

Repository structure

  • asset_data/full_dataset.csv – a CSV of daily prices/returns used to reproduce the paper’s experiments. It contains date‑indexed closing prices for a universe of up to 50 assets. The data are split chronologically: training covers 2000‑01‑01 to 2016‑12‑31, validation covers 2017‑01‑01 to 2019‑12‑31 and testing spans 2020‑01‑01 to 2024‑12‑31. Only the first data_pool columns (assets) are used during training.

  • 0_get_sig_data_all.py – pre‑computes signature and cross‑signature features. It reads full_dataset.csv, splits it into the train/val/test ranges above and saves the signature tensors and future returns for multiple asset pools and window/horizon configurations. Running this script is optional but speeds up training.

  • run.py – entry point for training and evaluation. It wraps the experiment class in exp/ and exposes many hyper‑parameters, such as number of assets (--data_pool), lookback window (--window_size), horizon (--horizon), model dimension (--d_model), number of transformer layers and heads, maximum position, trade cost etc.

  • runfile/test.sh – example shell script that trains SIT on three different asset pools (30, 40 and 50 assets) with different hyper‑parameter settings. Adjust the script or construct your own command lines using run.py.

  • results/ – contains equity curves (*_test_equity_curve.png), portfolio statistics (*_test_metrics.csv) and positions (*_test_positions.csv) generated by the example script.

Requirements and installation

SIT requires Python 3.8+ and PyTorch 1.10+. To install the dependencies, clone the repository and run:

# clone the project (replace with your fork if necessary)
git clone https://github.com/Yoontae6719/Signature-Informed-Transformer-For-Asset-Allocation.git
cd Signature-Informed-Transformer-For-Asset-Allocation

# install python packages
pip install -r requirements.txt  # installs PyTorch, pandas, numpy, tqdm, joblib, etc
  1. Obtain the dataset. A sample full_dataset.csv is provided under asset_data/. If you wish to experiment with your own assets, create a CSV with a Date column and one column per asset containing daily returns or prices. Missing values should be forward‑filled.

  2. Generate signatures (MUST). Running signature extraction ahead of time speeds up training. Use:

   # create signature caches for pools of 30, 40 and 50 assets with window=60 and horizon=20
   python 0_get_sig_data_all.py

The script iterates over DATA_POOLS = [40, 50, 30] and saves pre‑computed training, validation and test tensors to signature_cache_6020/pool_{n}. If you change the --window_size and --horizon values in run.py, re‑generate the cache accordingly. you can download pre-processed dataset Please click this one

Training and evaluation

To train SIT from scratch and evaluate it on the test set, execute:

python run.py \
    --is_training 1 \
    --model_id dp30 \
    --model SIT \
    --data FULL \
    --root_path ./asset_data/ \
    --data_path full_dataset.csv \
    --data_pool 30 \
    --window_size 60 \
    --horizon 20 \
    --d_model 8 \
    --n_heads 8 \
    --num_layers 1 \
    --sig_input_dim 2 \
    --cross_sig_dim 1 \
    --hidden_c 64 \
    --ff_dim 64 \
    --temperature 1.3 \
    --trade_cost_bps 0.0 \
    --itr 3

Alternatively, run the provided script:

bash ./runfile/test.sh

which trains three configurations sequentially. Training results and test performance are saved under results/.

Important command‑line flags

Flag Description
--data_pool Number of assets to include in the portfolio (e.g., 30, 40, 50).
--window_size Length of the historical window used to compute path signatures. The script 0_get_sig_data_all.py uses a default of 60.
--horizon Prediction horizon (in trading days). Default is 20.
--temperature Softmax temperature used when converting predicted returns into portfolio weights; higher temperature produces more uniform allocations.
--trade_cost_bps Transaction cost in basis points (e.g., 0.05 % = 0.5 bps).

Results and metrics

After training, SIT evaluates the portfolio on the validation and test sets. The experiment class computes the conditional value‑at‑risk (CVaR) and other metrics and saves:

  • Equity curves.png plots showing cumulative returns on the test set.
  • Metrics CSV – summary statistics such as annualised return, volatility, Sharpe ratio and CVaR.
  • Positions CSV – the predicted positions for each rebalancing date.

Results generated by test.sh can be found under results/.

Citation

will be updated

License

This project is open‑sourced under the MIT License. See LICENSE for details.