eckelsjd · eckelsjd · Mar 10, 2025 · Feb 25, 2025 · Mar 10, 2025
diff --git a/docs/guides/extend.md b/docs/guides/extend.md
@@ -20,7 +20,7 @@ from amisc.transform import Transform
 
 class CustomTransform(Transform):
     """A transform that adds 1."""
-    
+
     def _transform(self, values, inverse=False):
         return values - 1 if inverse else values + 1
 ```
@@ -150,13 +150,13 @@ class CustomInterpolatorState(InterpolatorState):
     pass
 ```
 
-The `Lagrange` polynomial interpolation class is the only available method currently. The state of a `Lagrange` polynomial includes the 1d grids and barycentric weights for each input dimension. See [Lagrange][amisc.interpolator.Lagrange] for more details.
+Currently, the available methods are `Lagrange` polynomial interpolation and `Linear` regression. The state of a `Lagrange` polynomial includes the 1d grids and barycentric weights for each input dimension. The state of a `Linear` regression includes the underlying [scikit-learn](https://scikit-learn.org/stable/) linear model. See [Lagrange][amisc.interpolator.Lagrange] and [Linear][amisc.interpolator.Linear] for more details. Note that linear regression also includes options for polynomial features.
 
 ## Model keyword arguments
 The `ModelKwarg` interface provides a dataclass for passing extra options to the underlying component models. The default is a simple `dict` that gets passed as a set of `key=value` pairs. The primary reason for overriding this class is if you have complicated arguments that require custom serialization. See the [serialization](#serialization) section below.
 
 ## Serialization
-The [Serializable][amisc.serialize.Serializable] interface defines the `serialize` and `deserialize` methods for converting `amisc` objects to/from built-in Python types (such as `strings`, `floats`, and `dicts`). 
+The [Serializable][amisc.serialize.Serializable] interface defines the `serialize` and `deserialize` methods for converting `amisc` objects to/from built-in Python types (such as `strings`, `floats`, and `dicts`).
 
 !!! Note "Important"
     All custom objects implementing any of the interfaces above must also implement the `serialize` and `deserialize` mixin methods to allow saving and loading the custom objects from file.
@@ -185,7 +185,7 @@ However, more generally, you may define a custom `FileLoader` if you prefer to w
 
     class JSONLoader(FileLoader):
         """Save and load amisc objects from JSON"""
-        
+
         def load(self, file)
             """Load an amisc.System object (for example)"""
             with open(file, 'r') as fd:
@@ -196,4 +196,4 @@ However, more generally, you may define a custom `FileLoader` if you prefer to w
             """Dump an amisc.System object (for example)"""
             with open(file, 'w') as fd:
                 json.dump(obj.serialize(), fd)
-    ```
+    ```
diff --git a/docs/guides/interface_models.md b/docs/guides/interface_models.md
@@ -228,7 +228,7 @@ As one last note, if your model returns field quantity data, this will be stored
 ## Specifying a surrogate method
 The `Component.interpolator` attribute provides the underlying surrogate "interpolation" method, i.e. the specific mathematical relationship that approximates the model's outputs as a function of its inputs. In this sense, we use the terms "interpolator" and "surrogate" interchangeably to mean the underlying approximation method -- the `Component.interpolator` does not necessarily have to "interpolate" the output by passing through all the training data directly. The naming convention mostly arises from the usage of polynomial interpolation in sparse grids.
 
-Currently, the only available interpolation method is the [Lagrange][amisc.interpolator.Lagrange] polynomial interpolation, which is set by default. Multivariate Lagrange polynomials are formed by a tensor-product of univariate Lagrange polynomials in each input dimension, and integrate well with the `SparseGrid` data structure. Lagrange polynomials work well up to an input dimension of around 12-15 for sufficiently smooth functions. More details on how they work can be found in the [theory](../theory/polynomials.md) section.
+Currently, the available methods are [Lagrange][amisc.interpolator.Lagrange] polynomial interpolation, which is set by default, and [Linear][amisc.interpolator.Linear] regression. Multivariate Lagrange polynomials are formed by a tensor-product of univariate Lagrange polynomials in each input dimension, and integrate well with the `SparseGrid` data structure. Lagrange polynomials work well up to an input dimension of around 12-15 for sufficiently smooth functions. More details on how they work can be found in the [theory](../theory/polynomials.md) section. Linear regression is implemented through the [scikit-learn](https://scikit-learn.org/stable/) library, and may optionally include polynomial features.
 
 You may configure the interpolation method via:
 ```python

diff --git a/pdm.lock b/pdm.lock
diff --git a/pyproject.toml b/pyproject.toml
@@ -13,6 +13,7 @@ dependencies = [
     "pyyaml>=6.0.2",
     "pydantic>=2.9.1",
     "dill>=0.3.9",
+    "scikit-learn>=1.6.1",
 ]
 requires-python = ">=3.11"
 readme = "README.md"

diff --git a/src/amisc/__init__.py b/src/amisc/__init__.py
@@ -18,9 +18,10 @@
 Variables additionally use `Transform`, `Distribution`, and `Compression` interfaces to manage normalization, PDFs,
 and field quantity compression, respectively.
 
-Currently, only Lagrange polynomial interpolation is implemented as the underlying surrogate method with a
-sparse grid data structure. SVD is also the only currently implemented method for compression. However, interfaces
-are provided for `Interpolator`, `TrainingData`, and `Compression` to allow for easy extension to other methods.
+Currently, only Lagrange polynomial interpolation and Linear regression are implemented as the underlying surrogate
+methods with a sparse grid data structure. SVD is also the only currently implemented method for compression. However,
+interfaces are provided for `Interpolator`, `TrainingData`, and `Compression` to allow for easy extension to other
+methods.
 
 Here is a class diagram summary of this workflow:
 

diff --git a/src/amisc/component.py b/src/amisc/component.py
@@ -1207,6 +1207,13 @@ def activate_index(self, alpha: MultiIndex, beta: MultiIndex, model_dir: str | P
             weight_fcns = self.inputs.get_pdfs()
 
         for a, b in indices:
+            if ((a, b[:len(self.data_fidelity)] + (0,) * len(self.surrogate_fidelity)) in
+                    self.active_set.union(self.candidate_set)):
+                # Don't refine training data if only updating surrogate fidelity indices
+                # Training data is the same for all surrogate fidelity indices, given constant data fidelity
+                design_list.append([])
+                continue
+
             design_coords, design_pts = self.training_data.refine(a, b[:len(self.data_fidelity)],
                                                                   domains, weight_fcns)
             design_pts, fc = to_model_dataset(design_pts, self.inputs, del_latent=True, **field_coords)
@@ -1247,29 +1254,35 @@ def activate_index(self, alpha: MultiIndex, beta: MultiIndex, model_dir: str | P
         for i, (a, b) in enumerate(indices):
             num_train_pts = len(design_list[i])
             end_idx = start_idx + num_train_pts  # Ensure loop dim of 1 gets its own axis (might have been squeezed)
-            yi_dict = {var: arr[np.newaxis, ...] if len(alpha_list) == 1 and arr.shape[0] != 1 else
-                       arr[start_idx:end_idx, ...] for var, arr in model_outputs.items()}
-
-            # Check for errors and store
-            err_coords = []
-            err_list = []
-            for idx in list(errors.keys()):
-                if idx < end_idx:
-                    err_info = errors.pop(idx)
-                    err_info['index'] = idx - start_idx
-                    err_coords.append(design_list[i][idx - start_idx])
-                    err_list.append(err_info)
-            if len(err_list) > 0:
-                self.logger.warning(f"Model errors occurred while adding candidate ({a}, {b}) for component "
-                                    f"{self.name}. Leaving NaN values in training data...")
-                self.training_data.set_errors(a, b[:len(self.data_fidelity)], err_coords, err_list)
-
-            # Compress field quantities and normalize
-            yi_dict, y_vars = to_surrogate_dataset(yi_dict, self.outputs, del_fields=False, **field_coords)
-
-            # Store training data, computational cost, and new interpolator state
-            self.training_data.set(a, b[:len(self.data_fidelity)], design_list[i], yi_dict)
-            self.training_data.impute_missing_data(a, b[:len(self.data_fidelity)])
+
+            if num_train_pts > 0:
+                yi_dict = {var: arr[np.newaxis, ...] if len(alpha_list) == 1 and arr.shape[0] != 1 else
+                           arr[start_idx:end_idx, ...] for var, arr in model_outputs.items()}
+
+                # Check for errors and store
+                err_coords = []
+                err_list = []
+                for idx in list(errors.keys()):
+                    if idx < end_idx:
+                        err_info = errors.pop(idx)
+                        err_info['index'] = idx - start_idx
+                        err_coords.append(design_list[i][idx - start_idx])
+                        err_list.append(err_info)
+                if len(err_list) > 0:
+                    self.logger.warning(f"Model errors occurred while adding candidate ({a}, {b}) for component "
+                                        f"{self.name}. Leaving NaN values in training data...")
+                    self.training_data.set_errors(a, b[:len(self.data_fidelity)], err_coords, err_list)
+
+                # Compress field quantities and normalize
+                yi_dict, y_vars = to_surrogate_dataset(yi_dict, self.outputs, del_fields=False, **field_coords)
+
+                # Store training data, computational cost, and new interpolator state
+                self.training_data.set(a, b[:len(self.data_fidelity)], design_list[i], yi_dict)
+                self.training_data.impute_missing_data(a, b[:len(self.data_fidelity)])
+
+            else:
+                y_vars = self._surrogate_outputs()
+
             self.misc_costs[a, b] = num_train_pts
             self.misc_states[a, b] = self.interpolator.refine(b[len(self.data_fidelity):],
                                                               self.training_data.get(a, b[:len(self.data_fidelity)],