DSI_codes

Codes for Tatsuya Haga, Yohei Oseki, Tomoki Fukai, "A unified neurocomputational model for spatial and linguistic representations"

Dependency

We used a UNIX environment (ubuntu 20.04), a GPU (NVIDIA GeForce RTX 3090) with CUDA 12.1, Python 3.10, and following Python libraries: numpy 1.23, scipy 1.10, nltk 3.7, gensim 4.3, networkx 2.8, cupy 12.0, matplotlib 3.6, pygraphviz 1.9, pandas 1.5, scikit-learn 1.2.

An environment of Anaconda/Miniconda is given by a file env_cupy_nlp.yml.

conda env create -f env_cupy_nlp.yml

conda activate cupy_nlp

We note that parallel processing of 12 - 16 threads runs in the codes.

DSI_2Dspace

Codes for Figure 2 and 3. Execute bash batch_all.sh in each directory, or manually execute individual commands in shell scripts. Outputs are figures and txt files in which quantitative values are written. Run time was several minutes.

DSI_spatial_inference

Codes for Figure 7. Execute bash batch_multitrial.sh in each directory, or "bash batch_all.sh" in a directory code. Summarized figures will be generated by batch_multitrial.sh. Run time was about 1 hour per one simulation (five simulations are repeated by batch_multitrial.sh).

text_data_preprocessing

Preprocessing codes for Figure 4, 5, and 6.

Download a wikipedia dump file enwiki-latest-pages-articles.xml.bz2 at https://dumps.wikimedia.org/enwiki/latest/ . The version we used (22-May-2020) is available upon request.
Apply Wikiextractor ( https://github.com/attardi/wikiextractor ) to the dump file. Put the output directory text in text_data_preprocessing and execute bash batch.sh.
Move output files in enwiki_corpus_files to DSI_word_embedding directory.

DSI_word_embedding

Main codes for Figure 4, 5, and 6.

Download WS353 dataset at http://alfonseca.org/eng/research/wordsim353.html and Mikolov's dataset at https://aclweb.org/aclwiki/Google_analogy_test_set_(State_of_the_art) . Put them in dataset_eval directory.
Execute bash batch.sh in each directory. Outputs are figures and txt files in which TOP-10 words for each unit and quantitative values are written. Run time was approximately 1 hour for each condition.
Codes for GLoVe were not included. Apply codes at https://github.com/stanfordnlp/GLoVe to enwiki_filtered_1d.txt, and put an output file vectors.txt in the directory glove_eval.Then execute bash batch.sh in glove_eval.
result_summary is for plotting a summary of all methods. We put data that we plotted in the article.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DSI_codes

Dependency

DSI_2Dspace

DSI_spatial_inference

text_data_preprocessing

DSI_word_embedding

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
DSI_2Dspace		DSI_2Dspace
DSI_spatial_inference		DSI_spatial_inference
DSI_word_embedding		DSI_word_embedding
text_data_preprocessing		text_data_preprocessing
README.md		README.md
env_cupy_nlp.yml		env_cupy_nlp.yml

celsopitta/DSI_codes

Folders and files

Latest commit

History

Repository files navigation

DSI_codes

Dependency

DSI_2Dspace

DSI_spatial_inference

text_data_preprocessing

DSI_word_embedding

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages