What happens when using three kernels and corresponding three descriptors, but only adding a two-dimensional sparse environment? #425

rbjiawen · 2024-11-01T02:58:27Z

Hi, in my recent test:

B1_descriptor = B1(radial_basis, cutoff_name, radial_hyps, cutoff_hyps,
                 [n_species, 10]) 
B2_descriptor = B2(radial_basis, cutoff_name, radial_hyps, cutoff_hyps,
                 [n_species, 8, 3])
B3_descriptor = B2(radial_basis,cutoff_name, radial_hyps, cutoff_hyps,
                  [n_species, 8, 3]) 
descriptors = [B1_descriptor,B2_descriptor,B3_descriptor]
kernels = [dot_product_kernel_B1, dot_product_kernel_B2, dot_product_kernel_B3]
sparse_gp.sparse_gp.add_uncertain_environments(struc_pp, [sparse_size,sparse_size])
#sparse_gp.sparse_gp.add_uncertain_environments(struc_pp, [sparse_size,sparse_size,sparse_size])

When the kernel list and descriptors list remain three-dimensional, the mean absolute error on the training set will drop significantly when using a two-dimensional atomic environment instead of three. In addition, there is no error in running this way. I'm confused about this, is this the right way to start?

The text was updated successfully, but these errors were encountered:

jonpvandermause · 2024-11-01T04:50:05Z

The add_uncertain_environments method of the SparseGP class begins as follows:

void SparseGP ::add_uncertain_environments(const Structure &structure,
                                           const std::vector<int> &n_added) {

  initialize_sparse_descriptors(structure);
  // Compute cluster uncertainties.
  std::vector<std::vector<int>> sorted_indices =
      sort_clusters_by_uncertainty(structure);

  std::vector<std::vector<int>> n_sorted_indices;
  for (int i = 0; i < n_kernels; i++) {
    // Take the first N indices.
    int n_curr = n_added[i];
    if (n_curr > sorted_indices[i].size())
      n_curr = sorted_indices[i].size();
    std::vector<int> n_indices(n_curr);
    for (int j = 0; j < n_curr; j++) {
      n_indices[j] = sorted_indices[i][j];
    }
    n_sorted_indices.push_back(n_indices);
  }

Notice that n_added[i] is out of bounds for i = 2 in your example, so the behavior of this function is undefined. It's possible you're getting a very large positive integer for n_added[2], in which case all environments are getting added for the third kernel, possibly explaining the improvement in MAE. Hard to say for sure.

I'm going to add an assertion that throws an error when there is a mismatch between the number of kernels and the size of n_added, since we really shouldn't be allowing them to be different.

jonpvandermause mentioned this issue Nov 1, 2024

check size of n_added #426

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What happens when using three kernels and corresponding three descriptors, but only adding a two-dimensional sparse environment? #425

What happens when using three kernels and corresponding three descriptors, but only adding a two-dimensional sparse environment? #425

rbjiawen commented Nov 1, 2024 •

edited

Loading

jonpvandermause commented Nov 1, 2024 •

edited

Loading

What happens when using three kernels and corresponding three descriptors, but only adding a two-dimensional sparse environment? #425

What happens when using three kernels and corresponding three descriptors, but only adding a two-dimensional sparse environment? #425

Comments

rbjiawen commented Nov 1, 2024 • edited Loading

jonpvandermause commented Nov 1, 2024 • edited Loading

rbjiawen commented Nov 1, 2024 •

edited

Loading

jonpvandermause commented Nov 1, 2024 •

edited

Loading