Code for the KDD 2026 paper Native Hierarchical and Compositional Representations with Subspace Embeddings.
The WordNet noun hierarchy is fetched from nltk.corpus, so no manual download is required.
Train (128×128 projection matrices, ridge λ = 0.2, noun synset):
python train_wordnet_reconstruction.py --N 128 --D 128 --lbd 0.2 --synset nThe resulting ReconstructionData (optimized embeddings + optimization config) is saved to:
./wn_r_embeddings/{synset}_{N}x{D}_{lbd}_{group_size}/
Evaluate:
python eval_wordnet_reconstruction.py \
--embed-path <path to the ReconstructionData saved above> \
--device cudaDownload HyperLex from https://github.com/cambridgeltl/hyperlex, then:
python eval_hyperlex.py \
--embed-path <path to the ReconstructionData> \
--hyperlex-path <hyperlex>/nouns-verbs/hyperlex-nouns.txtDownload the WordNet splits from https://github.com/lapras-inc/disk-embedding/tree/master/data/maxn. The directory should contain:
noun_closure.tsv.vocabnoun_closure.tsv.train_{percent}percentnoun_closure.tsv.validnoun_closure.tsv.testnoun_closure.tsv.full_negnoun_closure.tsv.valid_negnoun_closure.tsv.test_neg
Train (10% closure coverage, 128×128 projection matrices, ridge 0.2, γ⁺ = 0.8, γ⁻ = 0.1):
python train_wordnet_lp.py \
--dataset-path <root folder of the files above> \
--closure 0.1 \
--gamma-pos 0.8 --gamma-neg 0.1 \
--N 128 --D 128 --lbd 0.2The resulting LinkPredictionData is saved to:
./wn_lp_embeddings/{seed}_{int(100*closure)}_wordnet_subspace_{N}x{D}_{lbd}_{group_size}/
Evaluate:
python eval_wordnet_lp.py \
--embed-path <path to the LinkPredictionData saved above> \
--dataset-path <same root folder used for training>Train (sentence-transformers/all-mpnet-base-v2, 128×128 projection matrices, two-way):
python train_nli.py \
--base-model-name sentence-transformers/all-mpnet-base-v2 \
--N 128 --D 128 --two-wayThe NLITrainingData (state dict + training config) is saved to:
./nli_models/{base_model}_{N}x{D}_lbd{lbd}_context{max_length}_seed{seed}[_2way][_benchmark]/
Evaluate:
python eval_snli.py --root ./nli_models --model-name <name generated above>@inproceedings{moreira2026native,
author = {Moreira, Gabriel and Marinho, Zita and Marques, Manuel and Costeira, Jo{\~a}o Paulo and Xiong, Chenyan},
title = {Native Hierarchical and Compositional Representations with Subspace Embeddings},
booktitle = {Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '26)},
year = {2026},
}