-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a Rego Policy Enforcer #1493
Conversation
ctx := context.Background() | ||
results, err := query.Eval(ctx) | ||
if err != nil { | ||
log.G(ctx).WithError(err).WithFields(logrus.Fields{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like this line is actually logging anything, as it doesn't have a call to Error
, Info
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matajoh is this supposed to be .Error()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for nit PR: objects
is the user's input policy, right? So I think we shouldn't bother logging it here at all (maybe just log it once as Debug
when we read it in).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change based on a recommendation from @anmaxvl. Whatever is the desired convention for this repo should be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fairly certain as written this line does not log anything. It creates a temporary logger object that has an error and fields associated with it, but then we don't call anything like Error
on it, so that logger just gets thrown away.
Personally I'd be more in favor of just removing this line and logging the generated Rego when we construct the enforcer (as I commented above).
|
||
output := buf.String() | ||
if len(output) > 0 { | ||
log.G(ctx).Debug(output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer to write this as:
log.G(ctx).WithField("output", output).Debug("rego query output")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes it more clear in the logs what is being written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave this to @matajoh as I'm not sure of the correct change. Your suggestion looks reasonable to me, but I'm not positive it is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can take this to the nit PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is perfectly fine as an alternate. I have no strong opinion here.
@anmaxvl when did we decide we're going to work on splitting out stuff from |
return fmt.Errorf("Unable to marshal the Rego input data.") | ||
} | ||
|
||
return fmt.Errorf("device mount not allowed by policy.\ninput: %s", string(input_json)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than printing the serialized input JSON, I think we should just print the target
and deviceHash
inputs. That will look better in an error string, where we want to avoid newlines.
e.g. fmt.Errorf("device mount of %s to %s not allowed by policy", deviceHash, target)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My worry with such a change is that the input could be updated and not reflected in the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can take this to the nit PR.
return fmt.Errorf("overlay mount not allowed by policy.\ninput: %s", string(input_json)) | ||
} | ||
|
||
// we store the mapping of container ID -> layerPaths for later | ||
// use in EnforceCreateContainerPolicy here. | ||
containerMap := policy.data["containers"].(map[string]interface{}) | ||
if _, found := containerMap[containerID]; found { | ||
input_json, err := json.Marshal(input) | ||
if err != nil { | ||
return fmt.Errorf("Unable to marshal the Rego input data.") | ||
} | ||
|
||
return fmt.Errorf("container %s already mounted.\ninput: %s", containerID, string(input_json)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this "thing didn't work, here's the input to figure it out: %s" workflow is used a bunch, we might be able to just make a custom error type. Naming is up to you:
type RegoError struct {
Reason, Input string
}
func (re *RegoError) Error() string {
return fmt.Errorf("%s. Input: %s")
}
func NewRegoError(reason, input string) error {
return &RegoError{reason, input}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea. It conflicts with @kevpar's idea of limiting the values in the field and not printing all of input. I'm a fan of the "we don't miss updating an error message" aspect of the current way we print input. But Kevin has a good point about formatting. I'm leaving this til tomorrow to get Matt's thoughts as there are a couple other points on this PR I can't address as I lack knowledge to know "the why" of a couple choices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a huge fan of typed errors. If we want to go down that route, I believe we should have errors for each kind of failure and populate the objects with all the data in input. However, a question worth asking is what we gain for the additional maintenance burden (i.e. having to update all the error types over time as the input changes). Whatever we do, it should be motivated by the needs of the people who will be reading and fixing the errors being reported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the more specific we can get about what broke the better in addition to the full input. Or log the full input once after listing all of the specific errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of typed errors for specific domains like this, but it sounds like Kevin's suggestion for just printing the devicehash and target in errors was moved to the follow up nit PR. Let's figure out a story for this if we want to go this route there then.
github.com/cenkalti/backoff/v4 v4.1.3 | ||
github.com/containerd/cgroups v1.0.3 | ||
github.com/containerd/console v1.0.2 | ||
github.com/containerd/containerd v1.5.13 | ||
github.com/containerd/console v1.0.3 | ||
github.com/containerd/containerd v1.6.6 | ||
github.com/containerd/go-runc v1.0.0 | ||
github.com/containerd/ttrpc v1.1.0 | ||
github.com/containerd/typeurl v1.0.2 | ||
github.com/gogo/protobuf v1.3.2 | ||
github.com/golang/mock v1.6.0 | ||
github.com/google/go-cmp v0.5.6 | ||
github.com/google/go-cmp v0.5.7 | ||
github.com/google/go-containerregistry v0.5.1 | ||
github.com/google/uuid v1.3.0 | ||
github.com/linuxkit/virtsock v0.0.0-20201010232012-f8cee7dfc7a3 | ||
github.com/mattn/go-shellwords v1.0.6 | ||
github.com/opencontainers/runc v1.0.3 | ||
github.com/mattn/go-shellwords v1.0.12 | ||
github.com/open-policy-agent/opa v0.42.2 | ||
github.com/opencontainers/runc v1.1.2 | ||
github.com/opencontainers/runtime-spec v1.0.3-0.20210326190908-1c3f411f0417 | ||
github.com/pkg/errors v0.9.1 | ||
github.com/sirupsen/logrus v1.8.1 | ||
github.com/urfave/cli v1.22.2 | ||
github.com/vishvananda/netlink v1.1.1-0.20201029203352-d40f9887b852 | ||
github.com/vishvananda/netns v0.0.0-20200728191858-db3c7e526aae | ||
github.com/vishvananda/netlink v1.1.1-0.20210330154013-f5de75959ad5 | ||
github.com/vishvananda/netns v0.0.0-20210104183010-2eb08e3e575f | ||
go.etcd.io/bbolt v1.3.6 | ||
go.opencensus.io v0.22.3 | ||
golang.org/x/sync v0.0.0-20210220032951-036812b2e83c | ||
go.opencensus.io v0.23.0 | ||
golang.org/x/sync v0.0.0-20220601150217-0de741cfad7f | ||
golang.org/x/sys v0.0.0-20220412211240-33da011f77ad | ||
google.golang.org/grpc v1.40.0 | ||
google.golang.org/grpc v1.47.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these were bumped just for the new open-policy-agent dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its the new dependency. Given we merely run "go mod" to update this, I can't comment on what specifically causes a given line to update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea it looks like that dep has quite the dependency tree.. it's definitely the one bumping containerd and grpc, and the others are likely a cascade from the new containerd version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you also run go mod tidy
? I've found on past projects that there is a reduced likelihood of merge conflicts between different branches and PRs if go mod tidy
is never run since the old versions are always captured and can be compared against rather than removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore my comment, got myself turned around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea we have a CI job that should validate that the vendor dir/go.mod is matching after a go mod tidy/vendor flow. Just crazy to see so many bumps from a single dependency
we can do this right after all the Rego PRs are in - in a few weeks from now. |
I've done a number of updates here, hopefully they dont cause a lot of cascading issues with work we've already done. If they do, then for future PRs, I'd like to do save up the small nits and do after all the core feature PRs are in. I hope that is ok with everyone. Very happy to make changes, but unless they are hugely important, I'd like to do them later so they don't conflict with code that is already written and is waiting on branches. |
Thanks Sean, I think we can try to delay nits (such as formatting of error messages) to another PR, as long as we get that in soon. However, I'd prefer we address exporting of symbols here. The reason being that since That said, this package is still early, so I think we can get away with some API changes, but we should make an effort to not change exported symbols once they are introduced. I'm pretty sure this change mostly doesn't need any exported symbols changes, but I did notice the following, can you comment on if these all need to be done (or need to be exported)?
|
|
EnforceMountPolicy was incorrectly implemented previously. As a result, there was a bug in the existing version. There was also a bug in the new version when Matt originally implemented following the old model. Mount policy should be enforced as part of create not as a 2nd step after the create enforcement while still being part of the create step. It is part of the gating for create, not its own separate check. So, it was changed to fix bugs in both versions and get them both correctly having a single gating for create. |
The standard policy enforcer is a domain-specific logical language implemented in go over JSON. In the future, policy enforcement will need to increase in complexity and coverage. Instead of increasing the complexity of the DSL in turn to address these needs, it is preferable to use a policy language designed to express these constraints, such as Rego. This PR adds an alternate Rego policy enforcer which expresses the same logic as the StandardPolicyEnforcer entirely in Rego. We've added a RegoPolicy which implements SecurityPolicyEnforcer in securitypolicyenforcer_rego.go and a full test suite in regopolicy_test.go. The main design idea is that there are three elements which are used for evaluating the policy: Policy objects (i.e., containers) which are translated into Rego from our existing JSON format. Policy behavior, which is Rego that is in source control (policy.rego) which translates the enforcement logic currently in securitypolicyenforcer.go. Policy data (i.e., state) which is maintained over the course of enforcement and fed into the Rego during a query There is an additional element, namely the Microsoft Policy Framework (in framework.rego) which contains the majority of the enforcement logic. While at the moment this is used by the static policy.rego file, in the future it can be made available to customers who can choose to author their own policies. To add some details on the implementation, for each Enforce* method, we query the Rego policy with a set input. On success, we modify the data object. For a simple example, see how EnforceDeviceMount works: ```golang func (policy *RegoEnforcer) EnforceDeviceMountPolicy(target string, deviceHash string) error { policy.mutex.Lock() defer policy.mutex.Unlock() input := map[string]interface{}{ "name": "mount_device", "target": target, "deviceHash": deviceHash, } result, err := policy.Query(input) if err != nil { return err } if !result.Allowed() { input_json, err := json.Marshal(input) if err != nil { return fmt.Errorf("Unable to marshal the Rego input data.") } return fmt.Errorf("device mount not allowed by policy.\ninput: %s", string(input_json)) } deviceMap := policy.data["devices"].(map[string]string) if _, found := deviceMap[target]; found { input_json, err := json.Marshal(input) if err != nil { return fmt.Errorf("Unable to marshal the Rego input data.") } return fmt.Errorf("device %s already mounted.\ninput: %s", target, string(input_json)) } deviceMap[target] = deviceHash return nil } ``` The corresponding Rego for this is: ```rego default mount_device := false mount_device := true { some container in data.policy.containers some layer in container.layers input.deviceHash == layer } ``` We believe that this logic is much easier to reason about and maintain over time as opposed to the current system, while also allowing for a straightforward expansion in coverage over time. Note: This PR was intended to be additive and non-invasive, but in order to add the Rego dependencies multiple package revs appeared to be required. This is the source of the vast majority of the files that have been touched (i.e., in the vendor directory). Signed-off-by: Sean T. Allen <seanallen@microsoft.com>
if _, ok := deviceMap[target]; ok { | ||
input_json, err := json.Marshal(input) | ||
if err != nil { | ||
return errors.New("unable to marshal the Rego input data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making notes for a nit PR: Be nice to wrap this error so we know what Marshal actually failed with
if !result.Allowed() { | ||
input_json, err := json.Marshal(input) | ||
if err != nil { | ||
return errors.New("unable to marshal the Rego input data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Wrap this error so we know what Marshal actually failed with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You get the idea for the rest of the file
func (RegoEnforcer) ExtendDefaultMounts(_ []oci.Mount) error { | ||
return ErrNotImplemented | ||
func writeContainer(builder *strings.Builder, container *securityPolicyContainer, indent string) error { | ||
if _, err := builder.WriteString(fmt.Sprintf("%s{\n", indent)); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine to leave this as-is, but just so you know, (*strings.Builder).WriteString
is guaranteed to always return a nil
error, so I think we can remove all error handling from the marshalRego
path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really the error handling should be that the length of the written string is non-zero.
EnablePrintStatements: false, | ||
} | ||
|
||
if compiled, err := ast.CompileModulesWithOpt(modules, options); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit PR: More idiomatic to do:
compiled, err := ast.CompileModulesWithOpt(modules, options)
if err != nil {
return nil, fmt.Errorf("rego compilation failed: %w", err)
}
policy.compiledModules = compiled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've always kinda liked this inline style when the results of the function call don't need to be in scope longer than the if statement.
|
||
defaultMountData := make([]interface{}, 0, len(defaultMounts)) | ||
privilegedMountData := make([]interface{}, 0, len(privilegedMounts)) | ||
policy.data = map[string]interface{}{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a comment somewhere (maybe here) describing what each of the data fields is used for.
policy := new(regoEnforcer) | ||
policy.behavior = policyCode | ||
var err error | ||
policy.objects, err = securityPolicy.marshalRego() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be helpful to log policy.objects
as a debug message? Thinking of cases where we want to know exactly what policy was produced to troubleshoot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving as-is with the understanding that the Rego policy enforcer is still a work-in-progress, and will have more revisions coming soon (including the nit PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also alright with this on the pretext some of the nits get addressed in a follow-up. LGTM
|
||
mountConstraint_ok(constraint, mount) { | ||
mount.type == constraint.type | ||
mountSource_ok(constraint.source, mount.source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requiring the mount source in the customer rego seems strange since it is forced to be a suffix wildcard with a few compiled in options for the prefix. Any change in this implementation will break all policies, and there isn't any choice the customer can make.
data.framework.mountList_ok(container) | ||
} | ||
|
||
reason["invalid mount list"] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way we can differentiate between an invalid mount and a layer hash not matching?
The standard policy enforcer is a domain-specific logical language implemented in go over JSON. In the future, policy enforcement will need to increase in complexity and coverage. Instead of increasing the complexity of the DSL in turn to address these needs, it is preferable to use a policy language designed to express these constraints, such as Rego. This PR adds an alternate Rego policy enforcer which expresses the same logic as the
StandardPolicyEnforcer
entirely in Rego.We've added a
RegoPolicy
which implementsSecurityPolicyEnforcer
insecuritypolicyenforcer_rego.go
and a full test suite inregopolicy_test.go
. The main design idea is that there are three elements which are used for evaluating the policy:policy.rego
) which translates the enforcement logic currently insecuritypolicyenforcer.go
.There is an additional element, namely the Microsoft Policy Framework (in
framework.rego
) which contains the majority of the enforcement logic. While at the moment this is used by the staticpolicy.rego
file, in the future it can be made available to customers who can choose to author their own policies.To add some details on the implementation, for each
Enforce*
method, we query the Rego policy with a set input. On success, we modify thedata
object. For a simple example, see howEnforceDeviceMount
works:The corresponding Rego for this is:
We believe that this logic is much easier to reason about and maintain over time as opposed to the current system, while also allowing for a straightforward expansion in coverage over time.