Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What happens when using three kernels and corresponding three descriptors, but only adding a two-dimensional sparse environment? #425

Open
rbjiawen opened this issue Nov 1, 2024 · 1 comment

Comments

@rbjiawen
Copy link

rbjiawen commented Nov 1, 2024

Hi, in my recent test:

B1_descriptor = B1(radial_basis, cutoff_name, radial_hyps, cutoff_hyps,
                 [n_species, 10]) 
B2_descriptor = B2(radial_basis, cutoff_name, radial_hyps, cutoff_hyps,
                 [n_species, 8, 3])
B3_descriptor = B2(radial_basis,cutoff_name, radial_hyps, cutoff_hyps,
                  [n_species, 8, 3]) 
descriptors = [B1_descriptor,B2_descriptor,B3_descriptor]
kernels = [dot_product_kernel_B1, dot_product_kernel_B2, dot_product_kernel_B3]
sparse_gp.sparse_gp.add_uncertain_environments(struc_pp, [sparse_size,sparse_size])
#sparse_gp.sparse_gp.add_uncertain_environments(struc_pp, [sparse_size,sparse_size,sparse_size])

When the kernel list and descriptors list remain three-dimensional, the mean absolute error on the training set will drop significantly when using a two-dimensional atomic environment instead of three. In addition, there is no error in running this way. I'm confused about this, is this the right way to start?

@jonpvandermause
Copy link
Collaborator

jonpvandermause commented Nov 1, 2024

The add_uncertain_environments method of the SparseGP class begins as follows:

void SparseGP ::add_uncertain_environments(const Structure &structure,
                                           const std::vector<int> &n_added) {

  initialize_sparse_descriptors(structure);
  // Compute cluster uncertainties.
  std::vector<std::vector<int>> sorted_indices =
      sort_clusters_by_uncertainty(structure);

  std::vector<std::vector<int>> n_sorted_indices;
  for (int i = 0; i < n_kernels; i++) {
    // Take the first N indices.
    int n_curr = n_added[i];
    if (n_curr > sorted_indices[i].size())
      n_curr = sorted_indices[i].size();
    std::vector<int> n_indices(n_curr);
    for (int j = 0; j < n_curr; j++) {
      n_indices[j] = sorted_indices[i][j];
    }
    n_sorted_indices.push_back(n_indices);
  }

Notice that n_added[i] is out of bounds for i = 2 in your example, so the behavior of this function is undefined. It's possible you're getting a very large positive integer for n_added[2], in which case all environments are getting added for the third kernel, possibly explaining the improvement in MAE. Hard to say for sure.

I'm going to add an assertion that throws an error when there is a mismatch between the number of kernels and the size of n_added, since we really shouldn't be allowing them to be different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants