Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a Rego Policy Enforcer #1493

Merged
merged 1 commit into from
Aug 31, 2022
Merged

Adding a Rego Policy Enforcer #1493

merged 1 commit into from
Aug 31, 2022

Conversation

matajoh
Copy link
Member

@matajoh matajoh commented Aug 25, 2022

The standard policy enforcer is a domain-specific logical language implemented in go over JSON. In the future, policy enforcement will need to increase in complexity and coverage. Instead of increasing the complexity of the DSL in turn to address these needs, it is preferable to use a policy language designed to express these constraints, such as Rego. This PR adds an alternate Rego policy enforcer which expresses the same logic as the StandardPolicyEnforcer entirely in Rego.

We've added a RegoPolicy which implements SecurityPolicyEnforcer in securitypolicyenforcer_rego.go and a full test suite in regopolicy_test.go. The main design idea is that there are three elements which are used for evaluating the policy:

  1. Policy objects (i.e., containers) which are translated into Rego from our existing JSON format.
  2. Policy behavior, which is Rego that is in source control (policy.rego) which translates the enforcement logic currently in securitypolicyenforcer.go.
  3. Policy data (i.e., state) which is maintained over the course of enforcement and fed into the Rego during a query

There is an additional element, namely the Microsoft Policy Framework (in framework.rego) which contains the majority of the enforcement logic. While at the moment this is used by the static policy.rego file, in the future it can be made available to customers who can choose to author their own policies.

To add some details on the implementation, for each Enforce* method, we query the Rego policy with a set input. On success, we modify the data object. For a simple example, see how EnforceDeviceMount works:

func (policy *RegoEnforcer) EnforceDeviceMountPolicy(target string, deviceHash string) error {
	policy.mutex.Lock()
	defer policy.mutex.Unlock()

	input := map[string]interface{}{
		"name":       "mount_device",
		"target":     target,
		"deviceHash": deviceHash,
	}
	result, err := policy.Query(input)
	if err != nil {
		return err
	}

	if !result.Allowed() {
		input_json, err := json.Marshal(input)
		if err != nil {
			return fmt.Errorf("Unable to marshal the Rego input data.")
		}

		return fmt.Errorf("device mount not allowed by policy.\ninput: %s", string(input_json))
	}

	deviceMap := policy.data["devices"].(map[string]string)
	if _, found := deviceMap[target]; found {
		input_json, err := json.Marshal(input)
		if err != nil {
			return fmt.Errorf("Unable to marshal the Rego input data.")
		}

		return fmt.Errorf("device %s already mounted.\ninput: %s", target, string(input_json))
	}

	deviceMap[target] = deviceHash
	return nil
}

The corresponding Rego for this is:

default mount_device := false
mount_device := true {
    some container in data.policy.containers
    some layer in container.layers
    input.deviceHash == layer
}

We believe that this logic is much easier to reason about and maintain over time as opposed to the current system, while also allowing for a straightforward expansion in coverage over time.

Note
This PR was intended to be additive and non-invasive, but in order to add the Rego dependencies
multiple package revs appeared to be required. This is the source of the vast majority of the files
that have been touched (i.e., in the vendor directory).

@matajoh matajoh requested a review from a team as a code owner August 25, 2022 09:31
@matajoh matajoh changed the title Adding the Rego Policy Enforcer Adding a Rego Policy Enforcer Aug 25, 2022
ctx := context.Background()
results, err := query.Eval(ctx)
if err != nil {
log.G(ctx).WithError(err).WithFields(logrus.Fields{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like this line is actually logging anything, as it doesn't have a call to Error, Info, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matajoh is this supposed to be .Error() ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for nit PR: objects is the user's input policy, right? So I think we shouldn't bother logging it here at all (maybe just log it once as Debug when we read it in).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change based on a recommendation from @anmaxvl. Whatever is the desired convention for this repo should be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fairly certain as written this line does not log anything. It creates a temporary logger object that has an error and fields associated with it, but then we don't call anything like Error on it, so that logger just gets thrown away.

Personally I'd be more in favor of just removing this line and logging the generated Rego when we construct the enforcer (as I commented above).


output := buf.String()
if len(output) > 0 {
log.G(ctx).Debug(output)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to write this as:

log.G(ctx).WithField("output", output).Debug("rego query output")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it more clear in the logs what is being written.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this to @matajoh as I'm not sure of the correct change. Your suggestion looks reasonable to me, but I'm not positive it is correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can take this to the nit PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perfectly fine as an alternate. I have no strong opinion here.

@kevpar
Copy link
Member

kevpar commented Aug 27, 2022

@anmaxvl when did we decide we're going to work on splitting out stuff from pkg/securitypolicy that can live in internal instead?

return fmt.Errorf("Unable to marshal the Rego input data.")
}

return fmt.Errorf("device mount not allowed by policy.\ninput: %s", string(input_json))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than printing the serialized input JSON, I think we should just print the target and deviceHash inputs. That will look better in an error string, where we want to avoid newlines.

e.g. fmt.Errorf("device mount of %s to %s not allowed by policy", deviceHash, target)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My worry with such a change is that the input could be updated and not reflected in the error message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can take this to the nit PR.

Comment on lines 391 to 380
return fmt.Errorf("overlay mount not allowed by policy.\ninput: %s", string(input_json))
}

// we store the mapping of container ID -> layerPaths for later
// use in EnforceCreateContainerPolicy here.
containerMap := policy.data["containers"].(map[string]interface{})
if _, found := containerMap[containerID]; found {
input_json, err := json.Marshal(input)
if err != nil {
return fmt.Errorf("Unable to marshal the Rego input data.")
}

return fmt.Errorf("container %s already mounted.\ninput: %s", containerID, string(input_json))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this "thing didn't work, here's the input to figure it out: %s" workflow is used a bunch, we might be able to just make a custom error type. Naming is up to you:

type RegoError struct {
     Reason, Input string
}

func (re *RegoError) Error() string {
    return fmt.Errorf("%s. Input: %s")
}

func NewRegoError(reason, input string) error {
        return &RegoError{reason, input}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea. It conflicts with @kevpar's idea of limiting the values in the field and not printing all of input. I'm a fan of the "we don't miss updating an error message" aspect of the current way we print input. But Kevin has a good point about formatting. I'm leaving this til tomorrow to get Matt's thoughts as there are a couple other points on this PR I can't address as I lack knowledge to know "the why" of a couple choices.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a huge fan of typed errors. If we want to go down that route, I believe we should have errors for each kind of failure and populate the objects with all the data in input. However, a question worth asking is what we gain for the additional maintenance burden (i.e. having to update all the error types over time as the input changes). Whatever we do, it should be motivated by the needs of the people who will be reading and fixing the errors being reported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the more specific we can get about what broke the better in addition to the full input. Or log the full input once after listing all of the specific errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of typed errors for specific domains like this, but it sounds like Kevin's suggestion for just printing the devicehash and target in errors was moved to the follow up nit PR. Let's figure out a story for this if we want to go this route there then.

Comment on lines +8 to +34
github.com/cenkalti/backoff/v4 v4.1.3
github.com/containerd/cgroups v1.0.3
github.com/containerd/console v1.0.2
github.com/containerd/containerd v1.5.13
github.com/containerd/console v1.0.3
github.com/containerd/containerd v1.6.6
github.com/containerd/go-runc v1.0.0
github.com/containerd/ttrpc v1.1.0
github.com/containerd/typeurl v1.0.2
github.com/gogo/protobuf v1.3.2
github.com/golang/mock v1.6.0
github.com/google/go-cmp v0.5.6
github.com/google/go-cmp v0.5.7
github.com/google/go-containerregistry v0.5.1
github.com/google/uuid v1.3.0
github.com/linuxkit/virtsock v0.0.0-20201010232012-f8cee7dfc7a3
github.com/mattn/go-shellwords v1.0.6
github.com/opencontainers/runc v1.0.3
github.com/mattn/go-shellwords v1.0.12
github.com/open-policy-agent/opa v0.42.2
github.com/opencontainers/runc v1.1.2
github.com/opencontainers/runtime-spec v1.0.3-0.20210326190908-1c3f411f0417
github.com/pkg/errors v0.9.1
github.com/sirupsen/logrus v1.8.1
github.com/urfave/cli v1.22.2
github.com/vishvananda/netlink v1.1.1-0.20201029203352-d40f9887b852
github.com/vishvananda/netns v0.0.0-20200728191858-db3c7e526aae
github.com/vishvananda/netlink v1.1.1-0.20210330154013-f5de75959ad5
github.com/vishvananda/netns v0.0.0-20210104183010-2eb08e3e575f
go.etcd.io/bbolt v1.3.6
go.opencensus.io v0.22.3
golang.org/x/sync v0.0.0-20210220032951-036812b2e83c
go.opencensus.io v0.23.0
golang.org/x/sync v0.0.0-20220601150217-0de741cfad7f
golang.org/x/sys v0.0.0-20220412211240-33da011f77ad
google.golang.org/grpc v1.40.0
google.golang.org/grpc v1.47.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these were bumped just for the new open-policy-agent dependency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its the new dependency. Given we merely run "go mod" to update this, I can't comment on what specifically causes a given line to update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it looks like that dep has quite the dependency tree.. it's definitely the one bumping containerd and grpc, and the others are likely a cascade from the new containerd version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you also run go mod tidy? I've found on past projects that there is a reduced likelihood of merge conflicts between different branches and PRs if go mod tidy is never run since the old versions are always captured and can be compared against rather than removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore my comment, got myself turned around.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea we have a CI job that should validate that the vendor dir/go.mod is matching after a go mod tidy/vendor flow. Just crazy to see so many bumps from a single dependency

@anmaxvl
Copy link
Contributor

anmaxvl commented Aug 29, 2022

@anmaxvl when did we decide we're going to work on splitting out stuff from pkg/securitypolicy that can live in internal instead?

we can do this right after all the Rego PRs are in - in a few weeks from now.

@SeanTAllen
Copy link
Contributor

I've done a number of updates here, hopefully they dont cause a lot of cascading issues with work we've already done.

If they do, then for future PRs, I'd like to do save up the small nits and do after all the core feature PRs are in. I hope that is ok with everyone. Very happy to make changes, but unless they are hugely important, I'd like to do them later so they don't conflict with code that is already written and is waiting on branches.

@kevpar
Copy link
Member

kevpar commented Aug 29, 2022

I've done a number of updates here, hopefully they dont cause a lot of cascading issues with work we've already done.

If they do, then for future PRs, I'd like to do save up the small nits and do after all the core feature PRs are in. I hope that is ok with everyone. Very happy to make changes, but unless they are hugely important, I'd like to do them later so they don't conflict with code that is already written and is waiting on branches.

Thanks Sean, I think we can try to delay nits (such as formatting of error messages) to another PR, as long as we get that in soon. However, I'd prefer we address exporting of symbols here. The reason being that since pkg/securitypolicy is external, anything we export from there could be picked up and used elsewhere, which can make it harder to change code in the future.

That said, this package is still early, so I think we can get away with some API changes, but we should make an effort to not change exported symbols once they are introduced.

I'm pretty sure this change mostly doesn't need any exported symbols changes, but I did notice the following, can you comment on if these all need to be done (or need to be exported)?

  • Renamed ErrInvalidAllowAllPolicy to ErrInvalidOpenDoorPolicy
  • Removed SecurityPolicyEnforcer.EnforceMountPolicy
  • Changed args of SecurityPolicyEnforcer.EnforceCreateContainerPolicy
  • Added Mounts.Append
  • Added StringArray
  • Added a bunch of MarshalRego
  • Added EnvRuleArray

@matajoh
Copy link
Member Author

matajoh commented Aug 30, 2022

  • Renamed ErrInvalidAllowAllPolicy to ErrInvalidOpenDoorPolicy
    This was done in consultation with Maksim, and I believe it best to have the error naming line up with the actual name of the policy enforcement class.
  • Removed SecurityPolicyEnforcer.EnforceMountPolicy
    @SeanTAllen can speak to the details here, but this change needs to happen.
  • Changed args of SecurityPolicyEnforcer.EnforceCreateContainerPolicy
    This is part of the change above, and they come together.
  • Added Mounts.Append
    This doesn't need to be exported
  • Added StringArray
    No need to be exported, we can make this change
  • Added a bunch of MarshalRego
    We may be able to get away with not exporting this either, need to check a few things
  • Added EnvRuleArray
    No need to export

@matajoh
Copy link
Member Author

matajoh commented Aug 30, 2022

@kevpar @anmaxvl @dcantah I believe all the comments have been addressed at this point. We'd appreciate another review with an eye to squash and merge tomorrow AM (GMT).

@SeanTAllen
Copy link
Contributor

Removed SecurityPolicyEnforcer.EnforceMountPolicy
Changed args of SecurityPolicyEnforcer.EnforceCreateContainerPolicy

EnforceMountPolicy was incorrectly implemented previously. As a result, there was a bug in the existing version. There was also a bug in the new version when Matt originally implemented following the old model.

Mount policy should be enforced as part of create not as a 2nd step after the create enforcement while still being part of the create step. It is part of the gating for create, not its own separate check. So, it was changed to fix bugs in both versions and get them both correctly having a single gating for create.

The standard policy enforcer is a domain-specific logical language implemented in go over JSON. In the future, policy enforcement will need to increase in complexity and coverage. Instead of increasing the complexity of the DSL in turn to address these needs, it is preferable to use a policy language designed to express these constraints, such as Rego. This PR adds an alternate Rego policy enforcer which expresses the same logic as the StandardPolicyEnforcer entirely in Rego.

We've added a RegoPolicy which implements SecurityPolicyEnforcer in securitypolicyenforcer_rego.go and a full test suite in regopolicy_test.go. The main design idea is that there are three elements which are used for evaluating the policy:

Policy objects (i.e., containers) which are translated into Rego from our existing JSON format.
Policy behavior, which is Rego that is in source control (policy.rego) which translates the enforcement logic currently in securitypolicyenforcer.go.
Policy data (i.e., state) which is maintained over the course of enforcement and fed into the Rego during a query
There is an additional element, namely the Microsoft Policy Framework (in framework.rego) which contains the majority of the enforcement logic. While at the moment this is used by the static policy.rego file, in the future it can be made available to customers who can choose to author their own policies.

To add some details on the implementation, for each Enforce* method, we query the Rego policy with a set input. On success, we modify the data object. For a simple example, see how EnforceDeviceMount works:

```golang
func (policy *RegoEnforcer) EnforceDeviceMountPolicy(target string, deviceHash string) error {
	policy.mutex.Lock()
	defer policy.mutex.Unlock()

	input := map[string]interface{}{
		"name":       "mount_device",
		"target":     target,
		"deviceHash": deviceHash,
	}
	result, err := policy.Query(input)
	if err != nil {
		return err
	}

	if !result.Allowed() {
		input_json, err := json.Marshal(input)
		if err != nil {
			return fmt.Errorf("Unable to marshal the Rego input data.")
		}

		return fmt.Errorf("device mount not allowed by policy.\ninput: %s", string(input_json))
	}

	deviceMap := policy.data["devices"].(map[string]string)
	if _, found := deviceMap[target]; found {
		input_json, err := json.Marshal(input)
		if err != nil {
			return fmt.Errorf("Unable to marshal the Rego input data.")
		}

		return fmt.Errorf("device %s already mounted.\ninput: %s", target, string(input_json))
	}

	deviceMap[target] = deviceHash
	return nil
}
```

The corresponding Rego for this is:

```rego
default mount_device := false
mount_device := true {
    some container in data.policy.containers
    some layer in container.layers
    input.deviceHash == layer
}
```

We believe that this logic is much easier to reason about and maintain over time as opposed to the current system, while also allowing for a straightforward expansion in coverage over time.

Note:  This PR was intended to be additive and non-invasive, but in order to add the Rego dependencies
multiple package revs appeared to be required. This is the source of the vast majority of the files
that have been touched (i.e., in the vendor directory).

Signed-off-by: Sean T. Allen <seanallen@microsoft.com>
if _, ok := deviceMap[target]; ok {
input_json, err := json.Marshal(input)
if err != nil {
return errors.New("unable to marshal the Rego input data")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making notes for a nit PR: Be nice to wrap this error so we know what Marshal actually failed with

if !result.Allowed() {
input_json, err := json.Marshal(input)
if err != nil {
return errors.New("unable to marshal the Rego input data")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Wrap this error so we know what Marshal actually failed with

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You get the idea for the rest of the file

func (RegoEnforcer) ExtendDefaultMounts(_ []oci.Mount) error {
return ErrNotImplemented
func writeContainer(builder *strings.Builder, container *securityPolicyContainer, indent string) error {
if _, err := builder.WriteString(fmt.Sprintf("%s{\n", indent)); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine to leave this as-is, but just so you know, (*strings.Builder).WriteString is guaranteed to always return a nil error, so I think we can remove all error handling from the marshalRego path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really the error handling should be that the length of the written string is non-zero.

EnablePrintStatements: false,
}

if compiled, err := ast.CompileModulesWithOpt(modules, options); err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit PR: More idiomatic to do:

compiled, err := ast.CompileModulesWithOpt(modules, options)
if err != nil {
    return nil, fmt.Errorf("rego compilation failed: %w", err)
}
policy.compiledModules = compiled

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always kinda liked this inline style when the results of the function call don't need to be in scope longer than the if statement.


defaultMountData := make([]interface{}, 0, len(defaultMounts))
privilegedMountData := make([]interface{}, 0, len(privilegedMounts))
policy.data = map[string]interface{}{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a comment somewhere (maybe here) describing what each of the data fields is used for.

policy := new(regoEnforcer)
policy.behavior = policyCode
var err error
policy.objects, err = securityPolicy.marshalRego()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be helpful to log policy.objects as a debug message? Thinking of cases where we want to know exactly what policy was produced to troubleshoot.

Copy link
Member

@kevpar kevpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving as-is with the understanding that the Rego policy enforcer is still a work-in-progress, and will have more revisions coming soon (including the nit PR).

Copy link
Contributor

@dcantah dcantah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also alright with this on the pretext some of the nits get addressed in a follow-up. LGTM

@KenGordon KenGordon merged commit af624d9 into microsoft:main Aug 31, 2022

mountConstraint_ok(constraint, mount) {
mount.type == constraint.type
mountSource_ok(constraint.source, mount.source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requiring the mount source in the customer rego seems strange since it is forced to be a suffix wildcard with a few compiled in options for the prefix. Any change in this implementation will break all policies, and there isn't any choice the customer can make.

data.framework.mountList_ok(container)
}

reason["invalid mount list"] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can differentiate between an invalid mount and a layer hash not matching?

@SeanTAllen SeanTAllen deleted the rego-policy-enforcer branch September 1, 2022 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants