This is the origin Pytorch implementation of SIT in the following paper: SIGNATURE-INFORMED TRANSFORMER FOR ASSET ALLOCATION.
🚩News(AUG 08, 2025) We have released SIT.
-
asset_data/full_dataset.csv– a CSV of daily prices/returns used to reproduce the paper’s experiments. It contains date‑indexed closing prices for a universe of up to 50 assets. The data are split chronologically: training covers 2000‑01‑01 to 2016‑12‑31, validation covers 2017‑01‑01 to 2019‑12‑31 and testing spans 2020‑01‑01 to 2024‑12‑31. Only the firstdata_poolcolumns (assets) are used during training. -
0_get_sig_data_all.py– pre‑computes signature and cross‑signature features. It readsfull_dataset.csv, splits it into the train/val/test ranges above and saves the signature tensors and future returns for multiple asset pools and window/horizon configurations. Running this script is optional but speeds up training. -
run.py– entry point for training and evaluation. It wraps the experiment class inexp/and exposes many hyper‑parameters, such as number of assets (--data_pool), lookback window (--window_size), horizon (--horizon), model dimension (--d_model), number of transformer layers and heads, maximum position, trade cost etc. -
runfile/test.sh– example shell script that trains SIT on three different asset pools (30, 40 and 50 assets) with different hyper‑parameter settings. Adjust the script or construct your own command lines usingrun.py. -
results/– contains equity curves (*_test_equity_curve.png), portfolio statistics (*_test_metrics.csv) and positions (*_test_positions.csv) generated by the example script.
SIT requires Python 3.8+ and PyTorch 1.10+. To install the dependencies, clone the repository and run:
# clone the project (replace with your fork if necessary)
git clone https://github.com/Yoontae6719/Signature-Informed-Transformer-For-Asset-Allocation.git
cd Signature-Informed-Transformer-For-Asset-Allocation
# install python packages
pip install -r requirements.txt # installs PyTorch, pandas, numpy, tqdm, joblib, etc-
Obtain the dataset. A sample
full_dataset.csvis provided underasset_data/. If you wish to experiment with your own assets, create a CSV with aDatecolumn and one column per asset containing daily returns or prices. Missing values should be forward‑filled. -
Generate signatures (MUST). Running signature extraction ahead of time speeds up training. Use:
# create signature caches for pools of 30, 40 and 50 assets with window=60 and horizon=20
python 0_get_sig_data_all.pyThe script iterates over DATA_POOLS = [40, 50, 30] and saves pre‑computed training, validation and test tensors to signature_cache_6020/pool_{n}. If you change the --window_size and --horizon values in run.py, re‑generate the cache accordingly.
you can download pre-processed dataset Please click this one
To train SIT from scratch and evaluate it on the test set, execute:
python run.py \
--is_training 1 \
--model_id dp30 \
--model SIT \
--data FULL \
--root_path ./asset_data/ \
--data_path full_dataset.csv \
--data_pool 30 \
--window_size 60 \
--horizon 20 \
--d_model 8 \
--n_heads 8 \
--num_layers 1 \
--sig_input_dim 2 \
--cross_sig_dim 1 \
--hidden_c 64 \
--ff_dim 64 \
--temperature 1.3 \
--trade_cost_bps 0.0 \
--itr 3Alternatively, run the provided script:
bash ./runfile/test.shwhich trains three configurations sequentially. Training results and test performance are saved under results/.
| Flag | Description |
|---|---|
--data_pool |
Number of assets to include in the portfolio (e.g., 30, 40, 50). |
--window_size |
Length of the historical window used to compute path signatures. The script 0_get_sig_data_all.py uses a default of 60. |
--horizon |
Prediction horizon (in trading days). Default is 20. |
--temperature |
Softmax temperature used when converting predicted returns into portfolio weights; higher temperature produces more uniform allocations. |
--trade_cost_bps |
Transaction cost in basis points (e.g., 0.05 % = 0.5 bps). |
After training, SIT evaluates the portfolio on the validation and test sets. The experiment class computes the conditional value‑at‑risk (CVaR) and other metrics and saves:
- Equity curves –
.pngplots showing cumulative returns on the test set. - Metrics CSV – summary statistics such as annualised return, volatility, Sharpe ratio and CVaR.
- Positions CSV – the predicted positions for each rebalancing date.
Results generated by test.sh can be found under results/.
will be updated
This project is open‑sourced under the MIT License. See LICENSE for details.