Refactor core data model section #67

msporny · 2017-08-03T21:43:06Z

This is an initial refactoring of the core data model section in an attempt to clarify a number of confusions related to the data model.

You can preview the changes to section 3: Core Data Model here:

http://manu.sporny.org/tmp/vc-data-model/

dlongley

May have more comments later, just getting these in to get things rolling.

dlongley · 2017-08-03T22:01:05Z

index.html

+
+        <p>
+A <a>subject</a> is an <a>entity</a> about which <a>claims</a> may be made.
+A <a>claim</a> is statement made by an <a>entity</a> about a <a>subject</a>.


"made by an entity" is not in the model for a claim (see the diagram) so we should omit it, i.e. "A claim is a statement about a subject." This has been one of the challenges between mapping the "claim" language to implementations. Where "claim" is used in implementations is as a specific property of the vocabulary. It is used in a statement about a credential:

The credential C "includes the claim" that subject S is related to object O via property P.

So that looks like this as a graph:

[ C ] --claim--> [ S ] --P--> [ O ]

Fixed in e1f1b82

dlongley · 2017-08-03T22:02:13Z

index.html

+        <img style="margin: auto; display: block;"
+          width="75%" src="diagrams/claim-extended.svg">
+        <figcaption style="text-align: center;">
+Multiple claims may be combined to express a more complex mathematical graph.


mathematical is unnecessary

What about mathemagic instead?

Fixed in e1f1b82

dlongley · 2017-08-03T22:02:26Z

index.html

+A <a>credential</a> is set of one or more <a>claims</a> about a <a>subject</a>.
+It typically includes am identifier to uniquely identify the
+credential. Credential metadata may also be included to express things like
+when the credential expries. A digital signature is almost always appended by


expries => expires

Fixed in e1f1b82

dlongley · 2017-08-03T22:02:35Z

index.html

+
+      <p>
+A <a>credential</a> is set of one or more <a>claims</a> about a <a>subject</a>.
+It typically includes am identifier to uniquely identify the


am => an

Fixed in e1f1b82

msporny · 2017-08-04T13:59:44Z

Please provide a review of this PR before next weeks VCWG meeting so we can discuss on the call.

You can preview the changes to section 3: Core Data Model here: http://manu.sporny.org/tmp/vc-data-model/

The old Data Model section can be found here: https://www.w3.org/TR/verifiable-claims-data-model/#data-model

/cc @ChristopherA @kimdhamilton @stonematt @jandrieu @burnburn @dlongley

jandrieu · 2017-08-06T11:06:42Z

Related to the conversation with @dlongley on issue #66, we still have ambiguity about credentials, largely from language relying on the assumption that credentials contain a single claim.

My understanding is that credentials as issued are not what is presented to inspectors-verifiers in a profile.

In particular, there is a strong desire that holders can selectively disclose claims without revealing entire credentials as issued. This is captured in these requirements:

"holders should decide how to aggregate and manage verifiable claims",
"holders must control which verifiable claims to use and when, " and
"Verifiable claims must be able to be independently issued, stored, and verified."

Defining a profile as a set of credentials does not clearly provide the affordance required for minimal disclosure. Yes, a profile may contain credentials from multiple issuers, but there is no language describing how a single claim from a multi-claim credential is included in a profile.

I realize this may be a transitional artifact from the shift from "verifiable claims" as the core focus of the spec to include "verifiable credentials" (a new term). However, there remains some work to align how claims are extracted from credentials and packaged into profiles.

As I understand the rewrite, it may make sense to define credentials as "a set of one or more claims which are collectively verifiable". In other words, I think what we are trying to say is that what makes a claim verifiable is the meta-data in the credential in which it is issued. This is a shift from earlier language where each claim could have its own signature, but it's not an unreasonable evolution of the data model.

However, what is missing is the notion of slicing & dicing a given credential into something that can be combined into a profile.

Previously, we had:

credentials contained claims
claims were independently verifiable
profiles contained selected claims from various credentials. verifying each claim verified the profile.

Now we have

credentials contain claims
claims are only verifiable in the context of their credential
profiles contain selected credentials--including all claims in each credential. verifying the credentials verify the profile.

Yes, every issuer could break their credentials into multiple, independent subsets of claims, but this is counter to the approach of using zero-knowledge proofs to generate independently verifiable claims from a credential containing many claims. These independent claims would then be packaged into a profile for presentation to an inspector-verifier.

My biggest concern is not that we define how zero-knowledge proofs work or require them for credentials. Rather, I think we need to find terminology and a data model that allows the kind of remixing of claims that such technologies enable.

The current language--as I attempt to follow the gestalt of the proposed changes--suggests that a credential is the only independently verifiable payload and that any given claim is only verifiable when the credential is verifiable. We need work to clarify the data model if that's correct. We also still need to describe how we handle selective disclosure.

jandrieu · 2017-08-06T11:13:50Z

Example 6: A simple verifiable profile is clearly NOT consistent with the previous definition:

EXAMPLE 6: A simple verifiable profile
{
"id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
"type": ["Entity", "Person"],
"name": "Alice Bobman",
"email": "alice@example.com",
"birthDate": "1985-12-14",
"telephone": "12345678910"
}

This does not appear to be a collection of credentials. Neither does Example 11: A simple verifiable profile

EXAMPLE 11: A simple verifiable profile
{
"@context": "https://w3id.org/identity/v1",
"id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
"type": ["Entity", "Person"],
"name": "Alice Bobman",
"email": "alice@example.com",
"birthDate": "1985-12-14",
"telephone": "12345678910"
}

Finally, the section 7.2.2 Expressing an Verifiable Credential in JSON-LD actually describes a Veriviable Claim:

EXAMPLE 13: A simple verifiable claim
{
"@context": [
"https://w3id.org/identity/v1",
"https://w3id.org/security/v1"
],
"id": "http://example.gov/credentials/3732",
"type": ["Credential", "ProofOfAgeCredential"],
"issuer": "https://dmv.example.gov",
"issued": "2010-01-01",
"claim": {
"id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
"ageOver": 21
},
"signature": {
"type": "LinkedDataSignature2015",
"created": "2016-06-18T21:10:38Z",
"creator": "https://example.com/jdoe/keys/1",
"domain": "json-ld.org",
"nonce": "6165d7e8",
"signatureValue": "g4j9UrpHM4/uu32NlTw0HDaSaYF2sykskfuByD7UbuqEcJIKa+IoLJLrLjqDnMz0adwpBCHWaqqpnd47r0NKZbnJarGYrBFcRTwPQSeqGwac8E2SqjylTBbSGwKZkprEXTywyV7gILlC8a+naA7lBRi4y29FtcUJBTFQq4R5XzI="
}
}

This still needs some work. The last item feels mostly like simple editorial (completing the shift from verifiable claims through to verifiable credentials). However, the Verifiable Profiles are a bigger issue. Hopefully illustrating a profile that is actually a collection of credentials will illuminate some of my concerns about selective disclosure in my previous comment.

dlongley · 2017-08-06T16:20:33Z

Yes, a profile may contain credentials from multiple issuers, but there is no language describing how a single claim from a multi-claim credential is included in a profile.

Yes, we need to add language and examples showing how this could be done. We should show an example with atomized credentials where there is no "editing" to be done and an example where properties from the claim section are omitted -- with a note indicating that the signature type on the credential must support this method of selective disclosure.

msporny · 2017-08-07T13:49:38Z

Hey @jandrieu, you wrote a lot :) - but in general, I agree w/ what you're asking for.

We need to clean up the prose to align it. To be clear, your new view on the spec has always been our view of the spec... we have just failed to explain it properly in the spec text (you'll see the implementations do track what you describe as your "Now we have" view of things.

I'm also noting that there is some miscommunication based on a gap in understanding the technical implementation specifics - that is, things like: you don't need zero knowledge proofs to do selective disclosure, and you shouldn't allow every credential to be selectively disclosed (sometimes two attributes MUST be bound together by the issuer).

So, there is much more to talk about, but I do think we can clean up many of the misleading examples you pointed out above. Thanks for the comments, I'll make this my next priority after we get the data model section sorted out. If any other editor or WG member wants to jump in and take a crack at it, that'd speed things up.

msporny · 2017-08-07T14:54:13Z

Responding to specifics raised by @jandrieu below...

we still have ambiguity about credentials, largely from language relying on the assumption that credentials contain a single claim.

I think this is a unfortunate misunderstanding. The data model was NEVER ONLY "credentials contain a single claim"... it was always credentials contain one or more claims.

My understanding is that credentials as issued are not what is presented to inspectors-verifiers in a profile.

This is correct.

In particular, there is a strong desire that holders can selectively disclose claims without revealing entire credentials as issued.

There is a desire to support it. The "strongness" of the desire is dependent on the use case.

I should also point out that I am very skeptical that we can implement this at scale in a way that is easy to integrate into most developer workflows. I know others in the group are claiming that it's possible, but I have yet to see a simple zero knowledge proof approach to address this problem. There is too much hand waving going on related to how this will work in the ecosystem, so we really need to get this sorted sooner than later.

Defining a profile as a set of credentials does not clearly provide the affordance required for minimal disclosure. Yes, a profile may contain credentials from multiple issuers, but there is no language describing how a single claim from a multi-claim credential is included in a profile.

Yes, and this is going to be a point of contention in the next several months as there are several ways to do this, ranging from selective disclosure of credentials, to redacted signatures, to zero knowledge proofs.

However, there remains some work to align how claims are extracted from credentials and packaged into profiles.

In reality, we do not have a single technical proposal where one extracts claims from credentials and packages them into a profile. I expect Evernym to submit something at some point related to this.

As I understand the rewrite, it may make sense to define credentials as "a set of one or more claims which are collectively verifiable".

Yes, that is a correct definition.

In other words, I think what we are trying to say is that what makes a claim verifiable is the meta-data in the credential in which it is issued.

Yes.

This is a shift from earlier language where each claim could have its own signature, but it's not an unreasonable evolution of the data model.

Hmm, you can have a credential with one claim that has a signature. So, in that case, does the claim have its own signature? I'm picking at the edges of your definition to see if you've thought those edges through.

However, what is missing is the notion of slicing & dicing a given credential into something that can be combined into a profile.

I'd argue that slicing and dicing probably belongs more in a particular signature suite than it does in the main spec. I admit that this should probably be a topic of debate for the next several weeks/months.

Now we have

credentials contain claims

claims are only verifiable in the context of their credential

profiles contain selected credentials--including all claims in each credential. verifying the credentials verify the profile.

This was always the case, but I can understand why others may have come to different conclusions since the ground has been shifting under us for a while now. My hope is that the ground is solidifying a bit and we can now focus on what you wrote above as the central message in the spec.

Yes, every issuer could break their credentials into multiple, independent subsets of claims, but this is counter to the approach of using zero-knowledge proofs to generate independently verifiable claims from a credential containing many claims. These independent claims would then be packaged into a profile for presentation to an inspector-verifier.

This is where I start to get very concerned about what is meant by "zero-knowledge proof" and "independent claims packaged into a profile". We need to be very specific about how we're achieving this from a technical perspective, as there are many scenarios where this technology just does not exist, or if it's built incorrectly, it completely violates one's privacy by just flat out not working when deployed at scale.

That said, we do need to start teasing out the appropriate language and specific way we're modeling this. Digital Bazaar has a few proposals on how one can support selective disclosure w/o using ZKPs, but do admit that we can't do it in a way that's fully anonymous (signature regenerated on every transmission) (due to the RSA/Ed25519 signature mechanisms we're proposing).

My biggest concern is not that we define how zero-knowledge proofs work or require them for credentials. Rather, I think we need to find terminology and a data model that allows the kind of remixing of claims that such technologies enable.

+1, I think we have that right now, but clearly we're not explaining that to an acceptable degree. I've also heard the Evernym folks state that they don't think we're there yet, so this is a discussion all of us have to have.

The current language--as I attempt to follow the gestalt of the proposed changes--suggests that a credential is the only independently verifiable payload and that any given claim is only verifiable when the credential is verifiable. We need work to clarify the data model if that's correct. We also still need to describe how we handle selective disclosure.

What you say above is /mostly/ correct.

One minor clarification:

a profile is also verifiable in that the subject/holder specifies the profiles scope (domain-locked, use-locked) and then counter-signs it before sending it to an issuer-inspector in order to prevent replay attacks.

Thanks for the feedback @jandrieu, that you're able to tease all of this out from the mess of prose in the current spec is impressive. :)

jandrieu · 2017-08-07T16:03:00Z

I see where I missed the detail. Credentials were defined as one or more claims and then in the Verifiable Claims Model, credentials could be verified by adding a signature. I totally read that whole section as adding a signature to a claim since it was in the Verifiable Claims Model section. Yet, clearly stated both there and elsewhere, was that the signature is not added to the claim--as I expected--but added to the Credential. That was the core of the disconnect.

If I understand your point, allowing each claim to have its own signature is computationally/size-wise untenable. However, it would have been the simplest slice & dice solution. Since it seemed like such a reasonable interpretation, that's how I've always thought it was intended. My mistake.

FWIW, I definitely don't see ZPKs as the only way to atomize claims and I definitely agree not all claims are appropriate for atomization.

Also FWIW, the Verifiable Profile example and definition is woefully inadequate. I like the scope of use and the signature by the holder, but that isnot yet present in either the prose definition or the data model example. As long as it is optional for use cases that don't need that level of assurance, we're good. I talk about this more in issue #68. Clearly the profile has quite a bit more than just a set of credentials.

Glad to see where I was wrong on this one. I had totally misread that and ran off with a completely different mental model.

dlongley · 2017-08-07T18:28:36Z

@msporny, @jandrieu,

My understanding is that credentials as issued are not what is presented to inspectors-verifiers in a profile.

This is correct.

I think we're going to continue to be confused about issues like this without examples. We'll need to get some examples into the spec to demonstrate what is presented in order to be sure we aren't miscommunicating.

dlongley · 2017-08-07T18:34:59Z

@jandrieu,

I see where I missed the detail. Credentials were defined as one or more claims and then in the Verifiable Claims Model, credentials could be verified by adding a signature. I totally read that whole section as adding a signature to a claim since it was in the Verifiable Claims Model section. Yet, clearly stated both there and elsewhere, was that the signature is not added to the claim--as I expected--but added to the Credential. That was the core of the disconnect.

Yes -- and this disconnect and confusion in the spec is what I'm referring to in #66.

@jandrieu -- please reread the OP for #66 and see if it makes more sense now.

ChristopherA · 2017-08-07T18:58:16Z

A couple of naming issues we need to address:

First off we have to be very careful about the use of the term selective disclosure. It should only refer to the cryptographic methods that are used by tools like you prove and identity mixer and not for other uses.

Instead, we should be using the term data minimization.

Of the various signature standards that we have been talking about I personally believe the low hanging fruit is the hash tree signature method which is a form of data minimization rather than a form of selective disclosure. In fact we may want in many cases to require it for multi-claims in entity profiles.

I was just looking in the w3c-dvcg repo read me and I don't see that spec proposal one listed.

msporny · 2017-08-07T19:05:49Z

I was just looking in the w3c-dvcg repo read me and I don't see that spec proposal one listed.

This is the one we have right now to do data minimization via redaction:

https://w3c-dvcg.github.io/lds-redaction2016/

It doesn't use hash trees, but rather uses hash lists to redact specific claims. This particular signature mechanism is not resistant to tracking, as the signature on the credential doesn't change (thus providing a unique, trackable identifier), even though the minimized data transmitted does change (attributes are replaced w/ salts+hashes).

kimdhamilton · 2017-08-07T23:03:23Z

In general, looks excellent. I had only 1 issue, which I think is covered by @jandrieu's and @msporny's discussion.

Here is the verifiable profile definition:

A verifiable profile is a profile that is tamper-resistant and whose contents are typically counter-signed by the holder or subject.

Example 6 doesn't have a signature, which is allowed by the above definition (i.e. "typically" not always counter-signed). But the definition says it is tamper-resistant. I can see how this is achievable, i.e if this is in a content-addressable store, but we may need some more clarity about the expectations for tamper-resistance?

EXAMPLE 6: A simple verifiable profile
{
  "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
  "type": ["Entity", "Person"],
  "name": "Alice Bobman",
  "email": "alice@example.com",
  "birthDate": "1985-12-14",
  "telephone": "12345678910"
}

I'm still getting up to speed in the Verifiable Claims data model, so I apologize if this is off course.

msporny · 2017-08-07T23:16:07Z

I'm still getting up to speed in the Verifiable Claims data model, so I apologize if this is off course.

No, you're right, that example is bad. Perhaps I should align the examples in the spec for this PR (I was going to do it later) to match the new data model section as those examples are CRAZY old and are clearly causing problems.

burnburn · 2017-08-08T13:09:07Z

On Mon, Aug 7, 2017 at 2:58 PM, Christopher Allen ***@***.***> wrote: A couple of naming issues we need to address: First off we have to be very careful about the use of the term selective disclosure. It should only refer to the cryptographic methods that are used by tools like you prove and identity mixer and not for other uses.

Why? If I only want to disclose a subset, the term "selective disclosure" intuitively sounds just right. The term "data minimization" sounds like we are creating a binary serialization, a "compiled" version.

…

Instead, we should be using the term data minimization. Of the various signature standards that we have been talking about I personally believe the low hanging fruit is the hash tree signature method which is a form of data minimization rather than a form of selective disclosure. In fact we may want in many cases to require it for multi-claims in entity profiles. I was just looking in the w3c-dvcg repo read me and I don't see that spec proposal one listed. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#67 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABHeU6LjvEOAP66ol9thi8P-dmnTY6sbks5sV15IgaJpZM4OtBsP> .

msporny · 2017-08-08T17:33:32Z

Thanks for the review all, merging.

@jandrieu and @ChristopherA - could you please raise issues related to the examples and graphics and I'll try to work through those in the next PRs.

msporny added 3 commits August 3, 2017 16:38

Add new data model diagrams.

e3d63dc

Update diagrams associated with new data model section.

cee0f8e

Initial refactor of data model section.

cf48624

dlongley requested changes Aug 3, 2017

View reviewed changes

msporny added 2 commits August 4, 2017 09:25

Fix a number of issues raised by @dlongley.

e1f1b82

Minor cleanup for prose.

1da0c24

msporny requested a review from stonematt August 4, 2017 13:55

jandrieu mentioned this pull request Aug 7, 2017

Why is the Identity Profile not included in the list of "standardization work"? #25

Closed

msporny added 5 commits August 8, 2017 10:29

Fix verifiable profile example.

c0cd7b5

Switch order of syntax sections to be more readable.

2a74bc2

Fix more Verifiable Profile examples.

a6003fb

Fix statement to question.

6d235e2

Minor fix to example for simple verifiable profile.

28df46a

msporny merged commit 6a84b3d into gh-pages Aug 8, 2017

msporny deleted the msporny-core-data-model branch August 13, 2017 14:05

Refactor core data model section #67

Refactor core data model section #67

Uh oh!

Conversation

msporny commented Aug 3, 2017

Uh oh!

dlongley left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msporny commented Aug 4, 2017

Uh oh!

jandrieu commented Aug 6, 2017

Uh oh!

jandrieu commented Aug 6, 2017

Uh oh!

dlongley commented Aug 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

msporny commented Aug 7, 2017

Uh oh!

msporny commented Aug 7, 2017

Uh oh!

jandrieu commented Aug 7, 2017

Uh oh!

dlongley commented Aug 7, 2017

Uh oh!

dlongley commented Aug 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChristopherA commented Aug 7, 2017

Uh oh!

msporny commented Aug 7, 2017

Uh oh!

kimdhamilton commented Aug 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

msporny commented Aug 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

burnburn commented Aug 8, 2017 via email

Uh oh!

msporny commented Aug 8, 2017

Uh oh!

Uh oh!

dlongley commented Aug 6, 2017 •

edited

Loading

dlongley commented Aug 7, 2017 •

edited

Loading

kimdhamilton commented Aug 7, 2017 •

edited

Loading

msporny commented Aug 7, 2017 •

edited

Loading