Skip to content

psaegert/flash-ansr

Repository files navigation

πŸ—οΈWork In ProgressπŸ—οΈ

⚑ANSR:
Flash Amortized Neural Symbolic Regression

pytest quality checks CodeQL Advanced

Introduction

Requirements

Hardware

  • 32 GB Memory
  • CUDA-enabled GPU
  • 12 GB VRAM
  • 64 GB Storage (subject to change)

Software

Getting Started

1. Clone the repository

git clone https://github.com/psaegert/flash-ansr
cd flash-ansr

2. Install the package

Optional: Create a virtual environment:

conda:

conda create -n ansr python=3.11 ipykernel ipywidgets
conda activate ansr

Then, install the package via

pip install -e .
pip install -e nsrops

Usage

Use a pre-trained model

Clone the Hugging Face model repository:

git clone https://huggingface.co/psaegert/ansr-models models/ansr-models
import flash_ansr

Training

Express

Use, copy or modify a config in ./configs:

./configs
β”œβ”€β”€ my_config
β”‚Β Β  β”œβ”€β”€ dataset_train.yaml          # Link to skeleton pool and padding for training
β”‚Β Β  β”œβ”€β”€ dataset_val.yaml            # Link to skeleton pool and padding for validation
β”‚Β Β  β”œβ”€β”€ evaluation.yaml             # Evaluation settings
β”‚Β Β  β”œβ”€β”€ expression_space.yaml       # Operators and variables
β”‚Β Β  β”œβ”€β”€ nsr.yaml                    # Model settings and link to expression space
β”‚Β Β  β”œβ”€β”€ skeleton_pool_train.yaml    # Sampling and holdout settings for training
β”‚Β Β  β”œβ”€β”€ skeleton_pool_val.yaml      # Sampling and holdout settings for validation
β”‚Β Β  └── train.yaml                  # Data and schedule for training

Run the training and evaluation pipeline with

./scripts/run.sh my_config

For more information see below.

Manual

0. Prerequisites

Test data structured as follows:

./data/ansr-data/test_set
β”œβ”€β”€ feynman
β”‚Β Β  └── FeynmanEquations.csv
β”œβ”€β”€ nguyen
β”‚Β Β  └── nguyen.csv
└── soose_nc
    └── nc.csv

The test data can be cloned from the Hugging Face data repository:

git clone https://huggingface.co/psaegert/ansr-data data/ansr-data

1. Import test data

External datasets must be imported into the ANSR format:

flash_ansr import-data -i "{{ROOT}}/data/ansr-data/test_set/soose_nc/nc.csv" -p "soose" -e "{{ROOT}}/configs/test_set_base/expression_space.yaml" -b "{{ROOT}}/configs/test_set_base/skeleton_pool.yaml" -o "{{ROOT}}/data/ansr-data/test_set/soose_nc/skeleton_pool" -v
flash_ansr import-data -i "{{ROOT}}/data/ansr-data/test_set/feynman/FeynmanEquations.csv" -p "feynman" -e "{{ROOT}}/configs/test_set_base/expression_space.yaml" -b "{{ROOT}}/configs/test_set_base/skeleton_pool.yaml" -o "{{ROOT}}/data/ansr-data/test_set/feynman/skeleton_pool" -v
flash_ansr import-data -i "{{ROOT}}/data/ansr-data/test_set/nguyen/nguyen.csv" -p "nguyen" -e "{{ROOT}}/configs/test_set_base/expression_space.yaml" -b "{{ROOT}}/configs/test_set_base/skeleton_pool.yaml" -o "{{ROOT}}/data/ansr-data/test_set/nguyen/skeleton_pool" -v

with

  • -i the input file

  • -p the name of the parser implemented in ./src/flash_ansr/compat/convert_data.py

  • -e the expression space

  • -b the config of a base skeleton pool to add the data to

  • -o the output directory for the resulting skeleton pool

  • -v verbose output

This will create and save a skeleton pool with the parsed imported skeletons in the specified directory:

./data/ansr-data/test_set/<test_set>
└── skeleton_pool
    β”œβ”€β”€ expression_space.yaml
    β”œβ”€β”€ skeleton_pool.yaml
    └── skeletons.pkl

2. Generate validation data

Validation data is generated by randomly sampling according to the settings in the skeleton pool config:

flash_ansr generate-skeleton-pool -c {{ROOT}}/configs/${CONFIG}/skeleton_pool_val.yaml -o {{ROOT}}/data/ansr-data/${CONFIG}/skeleton_pool_val -s 5000 -v

with

  • -c the skeleton pool config
  • -o the output directory to save the skeleton pool
  • -s the number of unique skeletons to sample
  • -v verbose output

3. Train the model

flash_ansr train -c {{ROOT}}/configs/${CONFIG}/train.yaml -o {{ROOT}}/models/ansr-models/${CONFIG} -v -ci 100000 -vi 10000

with

  • -c the training config
  • -o the output directory to save the model and checkpoints
  • -v verbose output
  • -ci the interval to save checkpoints
  • -vi the interval for validation

4. Evaluate the model

flash_ansr evaluate -c {{ROOT}}/configs/${CONFIG}/evaluation.yaml -m "{{ROOT}}/models/ansr-models/${MODEL}" -d "{{ROOT}}/data/ansr-data/test_set/soose_nc/dataset.yaml" -n 5000 -o {{ROOT}}/results/evaluation/${CONFIG}/soose_nc.pickle -v
flash_ansr evaluate -c {{ROOT}}/configs/${CONFIG}/evaluation.yaml -m "{{ROOT}}/models/ansr-models/${MODEL}" -d "{{ROOT}}/data/ansr-data/test_set/feynman/dataset.yaml" -n 5000 -o {{ROOT}}/results/evaluation/${CONFIG}/feynman.pickle -v
flash_ansr evaluate -c {{ROOT}}/configs/${CONFIG}/evaluation.yaml -m "{{ROOT}}/models/ansr-models/${MODEL}" -d "{{ROOT}}/data/ansr-data/test_set/nguyen/dataset.yaml" -n 5000 -o {{ROOT}}/results/evaluation/${CONFIG}/nguyen.pickle -v
flash_ansr evaluate -c {{ROOT}}/configs/${CONFIG}/evaluation.yaml -m "{{ROOT}}/models/ansr-models/${MODEL}" -d "{{ROOT}}/configs/${CONFIG}/dataset_val.yaml" -n 5000 -o {{ROOT}}/results/evaluation/${CONFIG}/val.pickle -v
flash_ansr evaluate -c {{ROOT}}/configs/${CONFIG}/evaluation.yaml -m "{{ROOT}}/models/ansr-models/${MODEL}" -d "{{ROOT}}/data/ansr-data/test_set/pool_15/dataset.yaml" -n 5000 -o {{ROOT}}/results/evaluation/${CONFIG}/pool_15.pickle -v
flash_ansr evaluate -c {{ROOT}}/configs/${CONFIG}/evaluation.yaml -m "{{ROOT}}/models/ansr-models/${MODEL}" -d "{{ROOT}}/configs/${CONFIG}/dataset_train.yaml" -n 5000 -o {{ROOT}}/results/evaluation/${CONFIG}/train.pickle -v

with

  • -c the evaluation config
  • -m the model to evaluate
  • -d the dataset to evaluate on
  • -n the number of samples to evaluate
  • -o the output file for results
  • -v verbose output

4.1 Evaluate NeSymRes

  1. Clone NeuralSymbolicRegressionThatScales to a directory of your choice.
  2. Download the 100M model as described here
  3. Move the 100M model into flash-ansr/models/nesymres/
  4. Create a Python 3.10 (!) environment and install flash-ansr as in the previous steps.
  5. Install NeSymRes in the same environment:
cd NeuralSymbolicRegressionThatScales
pip install -e src/
pip install lightning
  1. Navigate back to this repository and run the evaluation
cd flash-ansr
./scripts/evaluate_nesymres <test_set>

4.2 Evaluate PySR

  1. Install PySR in the same environment as flash-ansr.
  2. Run the evaluation
./scripts/evaluate_pysr <test_set>

Development

Setup

To set up the development environment, run the following commands:

pip install -e .[dev]
pip install -e ./nsrops
pre-commit install

Tests

Test the package with

./scripts/pytest.sh

for convenience.

Citation

@software{flash-ansr2024,
    author = {Paul Saegert},
    title = {Flash Amortized Neural Symbolic Regression},
    year = 2024,
    publisher = {GitHub},
    version = {0.1.0},
    url = {https://github.com/psaegert/flash-ansr}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published