Skip to content

Conversation

banks
Copy link
Contributor

@banks banks commented Oct 30, 2023

I'm not sure we'll be able to reproduce the panic but it has been observed internally when seal happens to race with a background metrics collection:

 "panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x3c7f811]

goroutine 11485 [running]:
github.com/hashicorp/vault/vault.(*PolicyStore).getACLView(0x0, 0x0?)
	/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/vault/policy_store_util_ent.go:161 +0x31
github.com/hashicorp/vault/vault.(*PolicyStore).policiesByNamespace(0x0?, {0xc149050, 0xc1d222d2c0}, 0x0, 0x0?)
	/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/vault/policy_store.go:687 +0xaf
github.com/hashicorp/vault/vault.(*PolicyStore).policiesByNamespaces(0xc1490f8?, {0xc149050, 0xc1d222d2c0}, 0x7850de80?, {0xc7b9964068, 0x1, 0x16d9d43?})
	/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/vault/policy_store.go:721 +0xf6
github.com/hashicorp/vault/vault.(*Core).configuredPoliciesGaugeCollector(0xc002dc9200, {0xc1490f8, 0xc2950c8690})
	/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/vault/core_metrics.go:596 +0x1a5
github.com/hashicorp/vault/helper/metricsutil.(*GaugeCollectionProcess).collectAndFilterGauges(0xc1fc6fabe0)
	/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/helper/metricsutil/gauge_process.go:186 +0x155
github.com/hashicorp/vault/helper/metricsutil.(*GaugeCollectionProcess).Run(0xc1fc6fabe0)
	/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/helper/metricsutil/gauge_process.go:270 +0x8a
created by github.com/hashicorp/vault/vault.(*Core).emitMetricsActiveNode in goroutine 10785
	/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/vault/core_metrics.go:335 +0xdf6"

After some analysis the only explanation I've been able to fit is around this racey nil check.

	func (c *Core) configuredPoliciesGaugeCollector(ctx context.Context) ([]metricsutil.GaugeLabelValues, error) {
		if c.policyStore == nil {
			return []metricsutil.GaugeLabelValues{}, nil
		}
                // ^^^ Above check is racey as it read c.policyStore without a lock.
                // Assume that seal runs concurrently and `teardownPolicyStore` 
                // sets c.policyStore to nil at this point. The `Rlock` below will 
                // wait for the sealing Goroutine to release stateLock and then will
                // read policyStore := nil and use it blindly below.
                // The first place this is dereferenced is the line indicated in the 
                // stack trace where the panic occurred.
		c.stateLock.RLock()
		policyStore := c.policyStore
		c.stateLock.RUnlock()

@banks banks added this to the 1.16.0-rc1 milestone Oct 30, 2023
@github-actions github-actions bot added the hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed label Oct 30, 2023
Copy link
Contributor

@hghaf099 hghaf099 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting!

@github-actions
Copy link

Build Results:
All builds succeeded! ✅

@github-actions
Copy link

CI Results:
All Go tests succeeded! ✅

Copy link
Collaborator

@raskchanky raskchanky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@banks banks merged commit 0634564 into main Oct 30, 2023
@banks banks deleted the fix/VAULT-21072-metrics-panic branch October 30, 2023 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants