Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent data tree #1207

Closed
sdementen opened this issue Feb 11, 2019 · 5 comments
Closed

inconsistent data tree #1207

sdementen opened this issue Feb 11, 2019 · 5 comments
Labels

Comments

@sdementen
Copy link
Contributor

I create some data via a policy (https://www.openpolicyagent.org/docs/rest-api.html#create-or-update-a-policy), I overwrite the data via the API (https://www.openpolicyagent.org/docs/rest-api.html#create-or-overwrite-a-document) and then I look at the data tree.
The data tree shows different inconsistent data depending on the path used to access the data, both looks to me incorrect. The first one gives a merge of the original and the change data, the second one gives the original data. I would expect to see the new data.

Expected Behavior

# http://127.0.0.1:8181/v1/data/foo/baz
{'result': {'my_data': {"dan": ["engineering"], "alice": ["hr"], "john": ["baz"]}}} 
# http://127.0.0.1:8181/v1/data/foo/baz/my_data
{'result': {"dan": ["engineering"], "alice": ["hr"], "john": ["baz"]}} 

Actual Behavior

# http://127.0.0.1:8181/v1/data/foo/baz
{'result': {'my_data': {'alice': ['hr'],
                        'bob': ['hr'],
                        'dan': ['engineering'],
                        'john': ['baz']}}}
# http://127.0.0.1:8181/v1/data/foo/baz/my_data
{'result': {'alice': ['engineering', 'webdev'], 'bob': ['hr']}}

Steps to Reproduce the Problem

  1. run opa run -s
  2. upload the policy (done in python so here pseudo syntax)
PUT /v1/policies/my_policy

package foo.baz

my_data = {
    "alice": ["engineering", "webdev"],
    "bob": ["hr"]
}
  1. change my_data via the API
PUT http://127.0.0.1:8181/v1/data/foo/baz/my_data
{"dan": ["engineering"], "alice": ["hr"], "john": ["baz"]}
  1. See what's on http://127.0.0.1:8181/v1/data/foo/baz
  2. See what's on http://127.0.0.1:8181/v1/data/foo/baz/my_data

Additional Info

I am not sure if it is a bug or me misunderstanding the OPA API or the OPA document logic (with base & virtual documents)

@tsandall tsandall added the bug label Feb 12, 2019
@tsandall
Copy link
Member

@sdementen apologies for the delayed response. Thanks for reporting this and providing all of the info to reproduce.

The current behaviour has base docs overriding the virtual docs unless the query refers into the virtual document, in which case the virtual document defines the value completely.

Typically base docs don't overlap with virtual docs. However, if users do load data like this, the answers should be consistent for all paths. I need to think about different options to resolve the inconsistency. Making virtual docs always override base docs in all cases seems like the simplest solution, but I need to think about this a bit more before committing.

@sdementen
Copy link
Contributor Author

@tsandall that is indeed what I was suspecting (having some overlay of different trees. Is it also linked to the namespace topic for bundles ?
There is maybe two issues in my issue:

  1. overlaying trees
  2. PUT operation that does not "overwrite" the tree by the one in the PUT but that "update" the tree.

For the first issue, I have not yet understood well the interactions between

  • "data/docs coming from files" (with the file watcher)
  • "data/docs updated via the API"
    Who "win" over the other if I both update the file and then update via the API and then again via the file ? and is it "overwrite" or "update" operations (see below the second issue).
    You also speak about virtual docs but I though virtuals docs were "generated implicitly" by rules while base docs were "raw statements" (https://www.openpolicyagent.org/docs/how-does-opa-work.html#data-and-policies). Your explanation confuses me as it seems you are saying there is a link between "data file" (base docs) and "data via API" (virtual docs), or at least this is what I understand with my yet nascent understanding of OPA.

For this second issue, is the doc slightly wrong (Create or Overwrite a Document should be Create or Update a Document) or is my use slightly wrong (for instance, I should provide a If-Match header) ?

@tsandall
Copy link
Member

The v1/data API combines base and virtual documents into a single global document: data. When you load JSON data into OPA (e.g., via PUT v1/data/some/path), it's namespaced under data. When you load Rego rules into OPA, the decisions they generate are also namespaced under data (e.g., a file containing package x.y and rule p = 1 would be accessible at data.x.y.p or v1/data/x/y/p in the HTTP API.

The question is what to do when two documents (base or virtual) overlap. In my last comment, I was considering making virtual docs override base docs always. This would resolve the consistency issue but it could be hard to debug if you load data into OPA that gets shadowed by a rule (where's my data?!).

The good news is that we can catch this problem when JSON and policy are loaded into OPA. We can detect the conflict/overlap and return an error to the caller so that OPA never gets into this state. I'm planning to submit a PR today that addresses this.

@sdementen
Copy link
Contributor Author

I am puzzled by the base vs virtual difference: you seem to say that base=data and virtual=rego rules.
In my example, I have just data (no rego rules are involved) so it is not a matter of base vs virtual. Could you clarify why you mention virtual docs while I am only using base docs (=data)?

@tsandall
Copy link
Member

Base documents just represent external data that gets loaded into OPA. The data can be loaded via the command-line, HTTP API, etc.

Virtual documents just represent data that gets generated by rules loaded into OPA. The rules can be loaded via the command-line, HTTP API, etc.

The original problem that this issue raises is that you can write a rule that defines a virtual document at path a/b/c and load JSON data at path a/b/c. When you do this, the evaluation results are inconsistent. If you ask for the the full tree, e.g., / or a/b you see the merge of the base and virtual documents. If you ask for a path defined by a rule, you see the document generated by the rule (no merge, the base documents are shadowed.)

The change I'm making will prevent this situation from happening in the first place.

For the first issue, I have not yet understood well the interactions between

  • "data/docs coming from files" (with the file watcher)
  • "data/docs updated via the API"
    Who "win" over the other if I both update the file and then update via the API and then again via the file ? and is it "overwrite" or "update" operations (see below the second issue).
    You also speak about virtual docs but I though virtuals docs were "generated implicitly" by rules while base docs were "raw statements" (https://www.openpolicyagent.org/docs/how-does-opa-work.html#data-and-policies). Your explanation confuses me as it seems you are saying there is a link between "data file" (base docs) and "data via API" (virtual docs), or at least this is what I understand with my yet nascent understanding of OPA.

To your question about file watching vs. API updates, the file watch should trigger and overwrite the updated data. That said, I said should because file watches are notoriously unreliable and don't work on some platforms. The file watching feature is primarily for development purposes. E.g., you can start the server/REPL locally and make changes to files and then see the results reflected when you evaluate the policy.

If you're wondering about how to load data and policy into OPA in a running system, see https://www.openpolicyagent.org/docs/bundles.html.

tsandall added a commit to tsandall/opa that referenced this issue Feb 14, 2019
Previously there were no checks in place to ensure that base and
virtual documents do not overlap. As a result, if users loaded raw
JSON and rules into OPA that overlapped, the evaluation results were
not well defined. With these changes, we can detect the overlap and
reject updates (to policies or data) that would cause inconsistent
results.

Fixes open-policy-agent#1207

Signed-off-by: Torin Sandall <torinsandall@gmail.com>
tsandall added a commit that referenced this issue Feb 15, 2019
Previously there were no checks in place to ensure that base and
virtual documents do not overlap. As a result, if users loaded raw
JSON and rules into OPA that overlapped, the evaluation results were
not well defined. With these changes, we can detect the overlap and
reject updates (to policies or data) that would cause inconsistent
results.

Fixes #1207

Signed-off-by: Torin Sandall <torinsandall@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants