Skip to content

PaulsonLab/TorchSISSO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 

Repository files navigation

TorchSISSO: A PyTorch-Based Implementation of the Sure Independence Screening and Sparsifying Operator (SISSO) for Efficient and Interpretable Model Discovery

torchsisso3

What is SISSO?

The Sure Independence Screening and Sparsifying Operator (SISSO) is a symbolic regression (SR) method that searches for interpretable models by exploring large feature spaces. It has proven particularly effective in fields like materials science, where it helps to uncover simple, accurate models that describe complex physical phenomena.

TorchSISSO is a native Python implementation of SISSO, built using the PyTorch framework. It addresses the limitations of the original FORTRAN-based SISSO by providing a faster, more flexible, and easier-to-use solution.

Installation

You can install TorchSISSO via pip:

pip install TorchSisso

Import your data and use the following code to fit a SISSO model

# import TorchSisso model class along with other useful packages
from TorchSisso import SissoModel
import numpy as np
import pandas as pd
import sympy
from sympy import symbols

# create dataframe composed of targets "y" and primary features "X"
df = pd.DataFrame(data)

#define unary and binary operators of interest
operators = ['+','-','*','/','exp','ln','sin','pow(2)']

# create SISSO model object with relevant user-defined inputs
sm = SissoModel(df= df #Takes the dataframe as input with first variable as target variable
                operators = operators #Takes the user-defined operators to perform the feature engineering
                n_expansion = 3 #Defines the number of feature expansions need to be considered
                n_term = 1 #Defines the number of terms in the final equation
                k = 20 #Defines the number of SIS features to be screened for $L_0$ regularization
                initial_screening = ["mi" or "spearman", quantile value] #Defines the feature screening option for high dimensional and 1-quantile_value defines
                                                                          #the features within this quantile range should be kept for feature expansion.
                use_gpu = True or False #Defines the flag whether to consider GPU or not (For efficient computation we consider using GPU only for $L_0$ Regularization.
                dimensionality = ['u1','u2','u3'] #Defines the units of the feature variables in string representation which later converted into sympy format to do the                                                         meaningful feature construction.
                relational_units = [(symbols('u1')*symbols('u2'),symbols('u3')] #Defines the list of tuples where each tuple represents the relational transformation.
                output_dim = (symbols('u1')*symbols('u1')) #Defines the units of the target variable which helps in narrowing down the space for Regularization.
                custom_unary_functions = [('cos', '-sin', 'exp', 'pow(2)'), ('tanh', 'pow(2)','sin')] #list of tuples (Note only torch supported functions can be customized)
                custom_binary_functions = [
                                          {
                                              'functions1': ['pow(2)'],  # Function to apply on var1 (X1)
                                              'functions2': ['sin'],  # No function to apply on var2 (X2)
                                              'operation': torch.add,  # Operation to apply between var1 and var2
                                              'final_functions': []  # Final function to apply to the result of addition
                                          }
                                      ]
#List of dicts where transformations of variables and operation is provided.. (For example the given code above gives sin(x2) + X^2)

                )

# Run the SISSO algorithm to get the interpretable model with the highest accuracy
rmse, equation, r2,_ = sm.fit()

Usage Examples

Usage of TorchSisso can be found in Open In Colab

About

Created and maintained by Madhav Muthyala. Please feel free to open issues in the Github or contact Madhav
(muthyala.7@osu.edu/mmuthyala@wisc.edu) in case of any problems/comments/suggestions in using the code.

Known Limitations and Future Work

We are aware of current limitations in the code and are actively working on addressing them in future updates. If you encounter any issues or have suggestions for improvements, please feel free to report them. Your feedback is valuable and helps us enhance the project.

Citation

@article{MUTHYALA2024100198,
title = {TorchSISSO: A PyTorch-based implementation of the sure independence screening and sparsifying operator for efficient and interpretable model discovery},
journal = {Digital Chemical Engineering},
volume = {13},
pages = {100198},
year = {2024},
issn = {2772-5081},
doi = {https://doi.org/10.1016/j.dche.2024.100198},
url = {https://www.sciencedirect.com/science/article/pii/S2772508124000607},
author = {Madhav Muthyala and Farshud Sorourifar and Joel A. Paulson},
keywords = {Symbolic regression, Interpretable machine learning, Sparsity},
abstract = {Symbolic regression (SR) is a powerful machine learning approach that searches for both the structure and parameters of algebraic models, offering interpretable and compact representations of complex data. Unlike traditional regression methods, SR explores progressively complex feature spaces, which can uncover simple models that generalize well, even from small datasets. Among SR algorithms, the Sure Independence Screening and Sparsifying Operator (SISSO) has proven particularly effective in the natural sciences, helping to rediscover fundamental physical laws as well as discover new interpretable equations for materials property modeling. However, its widespread adoption has been limited by performance inefficiencies and the challenges posed by its FORTRAN-based implementation, especially in modern computing environments. In this work, we introduce TorchSISSO, a native Python implementation built in the PyTorch framework. TorchSISSO leverages GPU acceleration, easy integration, and extensibility, offering a significant speed-up and improved accuracy over the original. We demonstrate that TorchSISSO matches or exceeds the performance of the original SISSO across a range of tasks, while dramatically reducing computational time and improving accessibility for broader scientific applications.}
}```

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages