Skip to content

Scaling PySR to very large and high-dimensional datasets #774

Answered by MilesCranmer
muhlbach asked this question in Q&A
Discussion options

You must be logged in to vote

For these settings I like to fit another machine learning model to the dataset first, and treat its predictions as “denoised”, because it will effectively average out the noise.

This is actually what the denoise and Xresampled options do!

However, this uses a Gaussian process as the secondary machine learning models. Gaussian processes take O(N^3) compute for a given dataset with N points. So for this problem I think I would try something like a neural network or even XGBoost. Then, take a grid of input features, evaluate the model over that grid, and that becomes your “denoised” y vector that you can feed PySR.

The high-dimensional feature space is a bit trickier since it’s not a good se…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@muhlbach
Comment options

Answer selected by muhlbach
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants