Skip to content

Commit 9429e17

Browse files
pamandrejkoJoe Alewine
authored andcommitted
[FAB-10450] Private Data Architecture
Added new content for a topic on Private Data in the Architecture section Change-Id: I82e0c5e994024616abe8c84c204ede57c78a856d Signed-off-by: pama-ibm <pama@ibm.com>
1 parent ae6f84c commit 9429e17

File tree

2 files changed

+245
-0
lines changed

2 files changed

+245
-0
lines changed

docs/source/architecture.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,5 +13,6 @@ Architecture Reference
1313
capability_requirements
1414
couchdb_as_state_database
1515
peer_event_services
16+
private-data-arch
1617
readwrite
1718
gossip

docs/source/private-data-arch.rst

Lines changed: 244 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
Private Data
2+
============
3+
4+
.. note:: This topic assumes an understand of the conceptual material in the
5+
`documentation on private data <private-data.html>`_.
6+
7+
Private data collection definition
8+
----------------------------------
9+
10+
A collection definition contains one or more collections, each having a policy
11+
definition listing the organizations in the collection, as well as properties
12+
used to control endorsement and, optionally, whether the data will be purged.
13+
14+
The collection definition gets deployed to the channel at the time of chaincode
15+
instantiation. If using the peer CLI to instantiate the chaincode, the
16+
collection definition file is passed to the chaincode instantiation
17+
using the ``--collections-config`` flag. If using a client SDK, check the `SDK
18+
documentation <https://fabric-sdk-node.github.io/>`_ for information on providing the collection
19+
definition.
20+
21+
Collection definitions are composed of five properties:
22+
23+
* ``name``: Name of the collection.
24+
25+
* ``policy``: Defines the organization peers allowed to persist the collection
26+
data expressed using the ``Signature`` policy syntax, with each member being
27+
included in an ``OR`` signature policy list.
28+
29+
* ``requiredPeerCount``: Minimum number of peers that the endorsing peer must
30+
successfully disseminate private data to before the peer signs the
31+
endorsement and returns the proposal response back to the client. When
32+
``requiredPeerCount`` is ``0``, it means that no distribution is **required**,
33+
but there may be some distribution if ``maxPeerCount`` is greater than zero. A
34+
``requiredPeerCount`` of ``0`` would typically not be recommended, as it could
35+
lead to loss of private data. Typically you would want to require at least some
36+
distribution of the private data at endorsement time to ensure redundancy of the
37+
private data on multiple peers in the network.
38+
39+
* ``maxPeerCount``: For data redundancy purposes, the number of other peers
40+
that the current endorsing peer will attempt to distribute the data to. If an
41+
endorsing peer becomes unavailable between endorsement time and commit time,
42+
other peers that are collection members but who did not yet receive the private
43+
data, will be able to pull the private data from the peers the private data was
44+
disseminated to. If this value is set to ``0``, the private data is not
45+
disseminated at endorsement time, forcing private data pulls on all authorized
46+
peers.
47+
48+
* ``blockToLive``: Represents how long the data should live on the private
49+
database in terms of blocks. The data will live for this specified number of
50+
blocks on the private database and after that it will get purged, making this
51+
data obsolete from the network. To keep private data indefinitely, that is, to
52+
never purge private data, set the ``blockToLive`` property to ``0``.
53+
54+
Here is a sample collection definition JSON file, containing an array of two
55+
collection definitions:
56+
57+
.. code:: bash
58+
59+
[
60+
{
61+
"name": "collectionMarbles",
62+
"policy": "OR('Org1MSP.member', 'Org2MSP.member')",
63+
"requiredPeerCount": 0,
64+
"maxPeerCount": 3,
65+
"blockToLive":1000000
66+
},
67+
{
68+
"name": "collectionMarblePrivateDetails",
69+
"policy": "OR('Org1MSP.member')",
70+
"requiredPeerCount": 0,
71+
"maxPeerCount": 3,
72+
"blockToLive":3
73+
}
74+
]
75+
76+
This example uses the organizations from the BYFN sample network, ``Org1`` and
77+
``Org2`` . The policy in the ``collectionMarbles`` definition authorizes both
78+
organizations to the private data. This is a typical configuration when the
79+
chaincode data needs to remain private from the ordering service nodes. However,
80+
the policy in the ``collectionMarblePrivateDetails`` definition restricts access
81+
to a subset of organizations in the channel (in this case ``Org1`` ). In a real
82+
scenario, there would be many organizations in the channel, with two or more
83+
organizations in each collection sharing private data between them.
84+
85+
How private data is committed
86+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
87+
88+
When authorized peers do not have a copy of the private data in their transient
89+
data store they will attempt to pull the private data from another authorized
90+
peer, *for a configurable amount of time* based on the peer property
91+
``peer.gossip.pvtData.pullRetryThreshold`` in the peer configuration ``core.yaml``
92+
file.
93+
94+
.. note:: The peers being asked for private data will only return the private data
95+
if the requesting peer is a member of the collection as defined by the
96+
policy.
97+
98+
Considerations when using ``pullRetryThreshold``:
99+
100+
* If the requesting peer is able to retrieve the private data within the
101+
``pullRetryThreshold``, it will commit the transaction to its ledger
102+
(including the private data hash), and store the private data in its
103+
state database, logically separated from other channel state data.
104+
105+
* If the requesting peer is not able to retrieve the private data within
106+
the ``pullRetryThreshold``, it will commit the transaction to it’s blockchain
107+
(including the private data hash), without the private data.
108+
109+
* If the peer was entitled to the private data but it is missing, then
110+
that the peer will not be able to endorse future transactions that reference
111+
the missing private data - a chaincode query for a key that is missing will
112+
be detected (based on the presence of the key’s hash in the state database),
113+
and the chaincode will receive an error.
114+
115+
Therefore, it is important to set the ``requiredPeerCount`` and ``maxPeerCount``
116+
properties large enough to ensure the availability of private data in your
117+
channel. For example, if each of the endorsing peers become unavailable
118+
before the transaction commits, the ``requiredPeerCount`` and ``maxPeerCount``
119+
properties will have ensured the private data is available on other peers.
120+
121+
.. note:: For collections to work, it is important to have cross organizational
122+
gossip configured correctly. Refer to our documentation on :doc:`gossip`,
123+
paying particular attention to the section on "anchor peers".
124+
125+
Endorsement
126+
~~~~~~~~~~~
127+
128+
The endorsing peer plays an important role in disseminating private data to
129+
other authorized peers, ensuring the availability of private data on the
130+
channel. To assist with this dissemination, the ``maxPeerCount`` and
131+
``requiredPeerCount`` properties in the collection definition control the
132+
dissemination behavior.
133+
134+
If the endorsing peer cannot successfully disseminate the private data to at least
135+
the ``requiredPeerCount``, it will return an error back to the client. The endorsing
136+
peer will attempt to disseminate the private data to peers of different organizations,
137+
in an effort to ensure that each authorized organization has a copy of the private
138+
data. Since transactions are not committed at chaincode execution time, the endorsing
139+
peer and recipient peers store a copy of the private data in a local ``transient store``
140+
alongside their blockchain until the transaction is committed.
141+
142+
Referencing collections from chaincode
143+
--------------------------------------
144+
145+
A set of `shim APIs <https://godoc.org/github.com/hyperledger/fabric/core/chaincode/shim>`_
146+
are available for setting and retrieving private data.
147+
148+
The same chaincode data operations can be applied to channel state data and
149+
private data, but in the case of private data, a collection name is specified
150+
along with the data in the chaincode APIs, for example
151+
``PutPrivateData(collection,key,value)`` and ``GetPrivateData(collection,key)``.
152+
153+
A single chaincode can reference multiple collections.
154+
155+
How to pass private data in a chaincode proposal
156+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
157+
158+
Since the chaincode proposal gets stored on the blockchain, it is also important
159+
not to include private data in the main part of the chaincode proposal. A special
160+
field in the chaincode proposal called the ``transient`` field can be used to pass
161+
private data from the client (or data that chaincode will use to generate private
162+
data), to chaincode invocation on the peer. The chaincode can retrieve the
163+
``transient`` field by calling the ```GetTransient()`` API <https://github.com/hyperledger/fabric/blob/13447bf5ead693f07285ce63a1903c5d0d25f096/core/chaincode/shim/interfaces_stable.go>`_.
164+
This ``transient`` field gets excluded from the channel transaction.
165+
166+
Considerations when using private data
167+
--------------------------------------
168+
169+
Querying Private Data
170+
~~~~~~~~~~~~~~~~~~~~~
171+
172+
Private collection data can be queried just like normal channel data, using
173+
shim APIs:
174+
175+
* ``GetPrivateDataByRange(collection, startKey, endKey string)``
176+
* ``GetPrivateDataByPartialCompositeKey(collection, objectType string, keys []string)``
177+
178+
And for the CouchDB state database, JSON content queries can be passed using the
179+
shim API:
180+
181+
* ``GetPrivateDataQueryResult(collection, query string)``
182+
183+
Limitations:
184+
185+
* Clients that call chaincode that executes queries should be aware that they
186+
may receive a subset of the result set, if the peer they query has missing
187+
private data, based on the explanation in Private Data Dissemination section
188+
above. Clients can query multiple peers and compare the results to
189+
determine if a peer may be missing some of the result set.
190+
* Chaincode that executes queries and updates data in a single transaction
191+
is not supported, as the query results cannot be validated on the peers
192+
that don’t have access to the private data, or on peers that are missing the
193+
private data that they have access to. If a chaincode invocation both queries
194+
and updates private data, the proposal request will return an error.
195+
* Note that private data collections only define which organization’s peers
196+
are authorized to receive and store private data, and consequently implies
197+
which peers can be used to query private data. Private data collections do not
198+
by themselves limit access control within chaincode. For example if
199+
non-authorized clients are able to invoke chaincode on peers that have access
200+
to the private data, the chaincode logic still needs a means to enforce access
201+
control as usual, for example by calling the GetCreator() chaincode API or
202+
using the client identity `chaincode library <https://github.com/hyperledger/fabric/tree/master/core/chaincode/lib/cid>`__ .
203+
204+
Using Indexes with collections
205+
------------------------------
206+
207+
The topic :doc:`couchdb_as_state_database` describes indexes that can be
208+
applied to the channel’s state database to enable JSON content queries, by
209+
packaging indexes in a ``META-INF/statedb/couchdb/indexes`` directory at chaincode
210+
installation time. Similarly, indexes can also be applied to private data
211+
collections, by packaging indexes in a ``META-INF/statedb/couchdb/collections/<collection_name>/indexes``
212+
directory. An example index is available `here <https://github.com/hyperledger/fabric-samples/blob/master/chaincode/marbles02_private/go/META-INF/statedb/couchdb/collections/collectionMarbles/indexes/indexOwner.json>`_.
213+
214+
Private Data Purging
215+
~~~~~~~~~~~~~~~~~~~~
216+
217+
To keep private data indefinitely, that is, to never purge private data,
218+
set ``blockToLive`` property to ``0``.
219+
220+
Recall that prior to commit, peers store private data in a local
221+
transient data store. This data automatically gets purged when the transaction
222+
commits. But if a transaction was never submitted to the channel and
223+
therefore never committed, the private data would remain in each peer’s
224+
transient store. This data is purged from the transient store after a
225+
configurable number blocks by using the peer’s
226+
``peer.gossip.pvtData.transientstoreMaxBlockRetention`` property in the peer
227+
``core.yaml`` file.
228+
229+
Upgrading a collection definition
230+
---------------------------------
231+
232+
If a collection is referenced by a chaincode, the chaincode will use the prior
233+
collection definition unless a new collection definition is specified at upgrade
234+
time. If a collection configuration is specified during the upgrade, a definition
235+
for each of the existing collections must be included, and you can add new
236+
collection definitions.
237+
238+
Collection updates becomes effective when a peer commits the block that
239+
contains the chaincode upgrade transaction. Note that collections cannot be
240+
deleted, as there may be prior private data hashes on the channel’s blockchain
241+
that cannot be removed.
242+
243+
.. Licensed under Creative Commons Attribution 4.0 International License
244+
https://creativecommons.org/licenses/by/4.0/

0 commit comments

Comments
 (0)