Codes for Tatsuya Haga, Yohei Oseki, Tomoki Fukai, "A unified neurocomputational model for spatial and linguistic representations"
We used a UNIX environment (ubuntu 20.04), a GPU (NVIDIA GeForce RTX 3090) with CUDA 12.1, Python 3.10, and following Python libraries: numpy 1.23, scipy 1.10, nltk 3.7, gensim 4.3, networkx 2.8, cupy 12.0, matplotlib 3.6, pygraphviz 1.9, pandas 1.5, scikit-learn 1.2.
An environment of Anaconda/Miniconda is given by a file env_cupy_nlp.yml
.
conda env create -f env_cupy_nlp.yml
conda activate cupy_nlp
We note that parallel processing of 12 - 16 threads runs in the codes.
Codes for Figure 2 and 3. Execute bash batch_all.sh
in each directory, or manually execute individual commands in shell scripts. Outputs are figures and txt files in which quantitative values are written. Run time was several minutes.
Codes for Figure 7. Execute bash batch_multitrial.sh
in each directory, or "bash batch_all.sh" in a directory code
. Summarized figures will be generated by batch_multitrial.sh. Run time was about 1 hour per one simulation (five simulations are repeated by batch_multitrial.sh
).
Preprocessing codes for Figure 4, 5, and 6.
-
Download a wikipedia dump file enwiki-latest-pages-articles.xml.bz2 at https://dumps.wikimedia.org/enwiki/latest/ . The version we used (22-May-2020) is available upon request.
-
Apply Wikiextractor ( https://github.com/attardi/wikiextractor ) to the dump file. Put the output directory
text
in text_data_preprocessing and executebash batch.sh
. -
Move output files in
enwiki_corpus_files
to DSI_word_embedding directory.
Main codes for Figure 4, 5, and 6.
-
Download WS353 dataset at http://alfonseca.org/eng/research/wordsim353.html and Mikolov's dataset at https://aclweb.org/aclwiki/Google_analogy_test_set_(State_of_the_art) . Put them in
dataset_eval
directory. -
Execute
bash batch.sh
in each directory. Outputs are figures and txt files in which TOP-10 words for each unit and quantitative values are written. Run time was approximately 1 hour for each condition. -
Codes for GLoVe were not included. Apply codes at https://github.com/stanfordnlp/GLoVe to
enwiki_filtered_1d.txt
, and put an output filevectors.txt
in the directoryglove_eval
.Then executebash batch.sh
inglove_eval
. -
result_summary
is for plotting a summary of all methods. We put data that we plotted in the article.