32
GB Memory- CUDA-enabled GPU
12
GB VRAM64
GB Storage (subject to change)
- Python
$\geq$ 3.11 -
pip
$\geq$ 21.3 with PEP 660 (see https://pip.pypa.io/en/stable/news/#v21-3) - (Ubuntu 22.04.3 LTS)
git clone https://github.com/psaegert/flash-ansr
cd flash-ansr
Optional: Create a virtual environment:
conda:
conda create -n ansr python=3.11 ipykernel ipywidgets
conda activate ansr
Then, install the package via
pip install -e .
pip install -e nsrops
Clone the Hugging Face model repository:
git clone https://huggingface.co/psaegert/ansr-models models/ansr-models
import flash_ansr
Use, copy or modify a config in ./configs
:
./configs
βββ my_config
βΒ Β βββ dataset_train.yaml # Link to skeleton pool and padding for training
βΒ Β βββ dataset_val.yaml # Link to skeleton pool and padding for validation
βΒ Β βββ evaluation.yaml # Evaluation settings
βΒ Β βββ expression_space.yaml # Operators and variables
βΒ Β βββ nsr.yaml # Model settings and link to expression space
βΒ Β βββ skeleton_pool_train.yaml # Sampling and holdout settings for training
βΒ Β βββ skeleton_pool_val.yaml # Sampling and holdout settings for validation
βΒ Β βββ train.yaml # Data and schedule for training
Run the training and evaluation pipeline with
./scripts/run.sh my_config
For more information see below.
Test data structured as follows:
./data/ansr-data/test_set
βββ feynman
βΒ Β βββ FeynmanEquations.csv
βββ nguyen
βΒ Β βββ nguyen.csv
βββ soose_nc
βββ nc.csv
The test data can be cloned from the Hugging Face data repository:
git clone https://huggingface.co/psaegert/ansr-data data/ansr-data
External datasets must be imported into the ANSR format:
flash_ansr import-data -i "{{ROOT}}/data/ansr-data/test_set/soose_nc/nc.csv" -p "soose" -e "{{ROOT}}/configs/test_set_base/expression_space.yaml" -b "{{ROOT}}/configs/test_set_base/skeleton_pool.yaml" -o "{{ROOT}}/data/ansr-data/test_set/soose_nc/skeleton_pool" -v
flash_ansr import-data -i "{{ROOT}}/data/ansr-data/test_set/feynman/FeynmanEquations.csv" -p "feynman" -e "{{ROOT}}/configs/test_set_base/expression_space.yaml" -b "{{ROOT}}/configs/test_set_base/skeleton_pool.yaml" -o "{{ROOT}}/data/ansr-data/test_set/feynman/skeleton_pool" -v
flash_ansr import-data -i "{{ROOT}}/data/ansr-data/test_set/nguyen/nguyen.csv" -p "nguyen" -e "{{ROOT}}/configs/test_set_base/expression_space.yaml" -b "{{ROOT}}/configs/test_set_base/skeleton_pool.yaml" -o "{{ROOT}}/data/ansr-data/test_set/nguyen/skeleton_pool" -v
with
-
-i
the input file -
-p
the name of the parser implemented in./src/flash_ansr/compat/convert_data.py
-
-e
the expression space -
-b
the config of a base skeleton pool to add the data to -
-o
the output directory for the resulting skeleton pool -
-v
verbose output
This will create and save a skeleton pool with the parsed imported skeletons in the specified directory:
./data/ansr-data/test_set/<test_set>
βββ skeleton_pool
βββ expression_space.yaml
βββ skeleton_pool.yaml
βββ skeletons.pkl
Validation data is generated by randomly sampling according to the settings in the skeleton pool config:
flash_ansr generate-skeleton-pool -c {{ROOT}}/configs/${CONFIG}/skeleton_pool_val.yaml -o {{ROOT}}/data/ansr-data/${CONFIG}/skeleton_pool_val -s 5000 -v
with
-c
the skeleton pool config-o
the output directory to save the skeleton pool-s
the number of unique skeletons to sample-v
verbose output
flash_ansr train -c {{ROOT}}/configs/${CONFIG}/train.yaml -o {{ROOT}}/models/ansr-models/${CONFIG} -v -ci 100000 -vi 10000
with
-c
the training config-o
the output directory to save the model and checkpoints-v
verbose output-ci
the interval to save checkpoints-vi
the interval for validation
flash_ansr evaluate -c {{ROOT}}/configs/${CONFIG}/evaluation.yaml -m "{{ROOT}}/models/ansr-models/${MODEL}" -d "{{ROOT}}/data/ansr-data/test_set/soose_nc/dataset.yaml" -n 5000 -o {{ROOT}}/results/evaluation/${CONFIG}/soose_nc.pickle -v
flash_ansr evaluate -c {{ROOT}}/configs/${CONFIG}/evaluation.yaml -m "{{ROOT}}/models/ansr-models/${MODEL}" -d "{{ROOT}}/data/ansr-data/test_set/feynman/dataset.yaml" -n 5000 -o {{ROOT}}/results/evaluation/${CONFIG}/feynman.pickle -v
flash_ansr evaluate -c {{ROOT}}/configs/${CONFIG}/evaluation.yaml -m "{{ROOT}}/models/ansr-models/${MODEL}" -d "{{ROOT}}/data/ansr-data/test_set/nguyen/dataset.yaml" -n 5000 -o {{ROOT}}/results/evaluation/${CONFIG}/nguyen.pickle -v
flash_ansr evaluate -c {{ROOT}}/configs/${CONFIG}/evaluation.yaml -m "{{ROOT}}/models/ansr-models/${MODEL}" -d "{{ROOT}}/configs/${CONFIG}/dataset_val.yaml" -n 5000 -o {{ROOT}}/results/evaluation/${CONFIG}/val.pickle -v
flash_ansr evaluate -c {{ROOT}}/configs/${CONFIG}/evaluation.yaml -m "{{ROOT}}/models/ansr-models/${MODEL}" -d "{{ROOT}}/data/ansr-data/test_set/pool_15/dataset.yaml" -n 5000 -o {{ROOT}}/results/evaluation/${CONFIG}/pool_15.pickle -v
flash_ansr evaluate -c {{ROOT}}/configs/${CONFIG}/evaluation.yaml -m "{{ROOT}}/models/ansr-models/${MODEL}" -d "{{ROOT}}/configs/${CONFIG}/dataset_train.yaml" -n 5000 -o {{ROOT}}/results/evaluation/${CONFIG}/train.pickle -v
with
-c
the evaluation config-m
the model to evaluate-d
the dataset to evaluate on-n
the number of samples to evaluate-o
the output file for results-v
verbose output
- Clone NeuralSymbolicRegressionThatScales to a directory of your choice.
- Download the
100M
model as described here - Move the
100M
model intoflash-ansr/models/nesymres/
- Create a Python 3.10 (!) environment and install flash-ansr as in the previous steps.
- Install NeSymRes in the same environment:
cd NeuralSymbolicRegressionThatScales
pip install -e src/
pip install lightning
- Navigate back to this repository and run the evaluation
cd flash-ansr
./scripts/evaluate_nesymres <test_set>
- Install PySR in the same environment as flash-ansr.
- Run the evaluation
./scripts/evaluate_pysr <test_set>
To set up the development environment, run the following commands:
pip install -e .[dev]
pip install -e ./nsrops
pre-commit install
Test the package with
./scripts/pytest.sh
for convenience.
@software{flash-ansr2024,
author = {Paul Saegert},
title = {Flash Amortized Neural Symbolic Regression},
year = 2024,
publisher = {GitHub},
version = {0.1.0},
url = {https://github.com/psaegert/flash-ansr}
}