-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Edge #50
Edge #50
Conversation
# Conflicts: # package-lock.json
…s/devices and gateway, plus signing (WIP)
Great bit of work; very comprehensive, covering a lot of bases. Having reviewed, I have 3 observations and 1 question:
|
Thanks for comments @Angus-McAllister
Sorry, I wasn't clear about this PR's purpose. It's not meant to be merged yet (it's a draft), as I intend to get back to working on
The vulnerability definitely exists in this prototype. Timesheet owners having exclusive access just affects the risk likelihood. Timesheet ownership is centrally managed in timeld, and requires access to the gateway's private domain and the Ably control API. This aspect of security is therefore implemented in a conventional way. Modulo bugs, it should be comparable to other cloud apps – but note of course that timeld should not be treated as a production app (yet!). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scope of review restricted to narrative describing work conducted; see previous comment and responses for details.
# Conflicts: # architecture/security/accounts.class.puml # architecture/security/img/register-cli.seq.svg # architecture/security/register-cli.seq.puml # package-lock.json # package.json # packages/cli/lib/AdminSession.mjs # packages/cli/lib/DomainConfigurator.mjs # packages/cli/lib/GatewayClient.mjs # packages/cli/package.json # packages/cli/test/GatewayClient.test.mjs # packages/common/index.mjs # packages/common/lib/AccountOwnedId.mjs # packages/common/lib/clone.mjs # packages/common/lib/util.mjs # packages/common/package.json # packages/gateway/README.md # packages/gateway/deploy.sh # packages/gateway/lib/Account.mjs # packages/gateway/lib/Authorization.mjs # packages/gateway/lib/Gateway.mjs # packages/gateway/package.json # packages/gateway/rest/index.mjs # packages/gateway/secrets.mjs # packages/gateway/server.mjs # packages/gateway/test/Account.test.mjs # packages/gateway/test/Gateway.test.mjs # packages/gateway/test/rest.test.mjs # packages/mite/package.json # publish.sh
- Upgrade to m-ld@edge - Secret keys no longer available to server - Authorisation JWTs use asymmetric crypto - Key generation utility genkey.mjs
AuditLogger converted to generic HTTP log shipping
Removed redundant mite package NPM update
- Use version bundle from npm - Update instructions for root keys - Include docker network creation
@mcalligator please review the last commit, c92ff5a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! As per review call, let's exclude .run/timeld-gateway.run.xml
from the commit.
# Conflicts: # package-lock.json # package.json # packages/caldav/package.json # packages/cli/package.json # packages/common/package.json # packages/gateway/package.json # packages/mite/package.json # packages/prejournal/package.json # packages/tiki/package.json
Signed journal entries prototype
m-ld/m-ld-security-spec#3 (part of the "Securing Shared Decentralised Live Information with m-ld" Project – 2021-02-035)
abstract
User identity and signatures within the m-ld protocol have been explored in the Integrity prototype. This milestone has combined these signatures with the use of the new Journal API prototype into an application, to drive the production of an audit log according to the traceability design.
This prototype proceeded as expected with no changes required in the prototyped APIs. However, the work has served to further highlight important concerns that must be addressed in a secure production system.
moving parts
application
The chosen application to which we are applying audit traceability is timeld (this repository). It is a hybrid centralised-decentralised application, similar in its deployment and security model to those analysed in our threat modelling.
The other prototyping approach we considered was to construct system or compliance tests for m-ld, as we have done in previous milestones. This option was rejected primarily because there are no further changes proposed to m-ld itself in this milestone. Using a real application will also highlight issues in the design and the protocol prototype from a usefully different perspective.
Since timeld is a live online application, we chose to conduct this prototype on a new branch,
edge
(corresponding to the edge branch of m-ld, having the security prototype). This branch has been deployed online totimeld-edge.fly.dev
. Note that to interact with this service (called a timeld "gateway"), you must use the timeld CLI version tagged asedge
, i.e.npm install timeld-cli@edge -g
(version correspondence checking in timeld is an open ticket).The main change to timeld for this prototype is to introduce an asymmetric cryptographic key pair for each user/device, and also for the gateway itself (captured in the
UserKey
class). The key pair is used by the "app" (in both CLI and gateway) to sign and verify m-ld protocol messages relating to Timesheets, the main application data resource. This is encapsulated in theTimeldApp
class, which is used in the initialisation of every local m-ld clone.The public keys are stored in the m-ld domain (the distributed data set) itself, so that any app having a clone of the domain is able to verify signatures. Adding users and their public keys is done exclusively by the gateway (see new method
writePrincipalToTimesheet
).The private keys are not stored in the shared Timesheet data, only locally in user devices and gateway secrets. They are also stored in the gateway's private domain, specifically for the purpose of signing imported timesheet data (although we did not complete this function, and imported data are currently signed with the gateway's own key).
The need for an asymmetric key pair in this app is complicated by its pre-existing use of Ably keys for security. These are keys provided by the messaging provider Ably, and corresponded with user accounts in timeld by the gateway. This mechanism was chosen because it allows the gateway to easily manage access to Ably channels, and thereby provide read access control to timesheet domains as a whole. However, Ably keys do not have a public component suitable for offline signature verification.
The solution arrived at is to generate the asymmetric key pairs for users during the device registration step. Each private key is encrypted using the Ably key so that they can only be used for signing by components that also have access to the Ably key. The details of this can be seen in the changes to the CLI registration sequence diagram.
auditing
The chosen system to which audit log entries are persisted in this prototype is the online logging service Logz.io.
This is a system with which we already have familiarity, and a community account; it is based on the commonly-used open-source ELK (Elasticsearch, Logstash, and Kibana) software stack; it has a built-in user interface with the ability to securely share logs; and has an easy-to-use NodeJS library for integration.
We also considered using the permissioned ledger Iroha, as used in a previous milestone as a consensus mechanism. However it does not have a suitable user interface for sharing logs, and would need to be deployed and operated by us online. Since timeld has a central trusted component, the gateway, Iroha's ability to operate fully decentralised does not weigh in its favour here.
The actual integration of Logz.io logging in the gateway simply ships each timesheet update, as seen by the gateway, to the online service via the integration library, augmented with the timestamp and identity of the gateway and timesheet, for filtering during audit.
Filtered logs can be made available via the Logz.io user interface (currently using the m-ld.io account; but this could be automated using the Logz API and so made available to any timeld account holder).
As an example, here is today's timeld session with the
m-ld-security
timesheet, in which I corrected yesterday's "PR finalisation task" and added today's "add example to PR" task:Here are the audit log entries for that timesheet. Note that:
/hits/hits[*]/_source
m-ld-security.audit.json
Here is the live filtered view in the Logz.io user interface (this may be empty if I have not done anything on the project in the last two days!):
Project timesheet audit entries
analysis
The following is a preliminary list of production-system concerns highlighted during the construction of this prototype; these will be folded into the next steps, and related to other research findings in the project write-up.
audit entry signatures
An app typically has no justification for re-verifying the signatures provided from its local clone via the API, because the clone is a component of the app – if there is malware in the clone implementation, the app itself may just as well be malware. Further, as noted in the design, if the clone/app is deployed as a service in a trusted environment (as it is in this prototype, and in our threat-modelled systems), then it must necessarily also be trusted to produce accurate logs.
In some systems, it may be required to provide externally-verifiable signatures. This is not just a case of exposing the signatures themselves (as would be trivial to do even in this prototype, since they are available to the API), but there must also be an externally-verifiable binding of user identities to public keys. This is the problem that Public Key Infrastructure (PKI) solves, and is more recently addressed in self-sovereign identity systems like Decentralized Identifiers (DIDs). Our solution is compatible with these by simple virtue of exposing the binary signatures, with minimal requirements on what they contain – simply, that they are verifiable.
timestamp trust
Similarly, if the online service is trusted to produce accurate logs then it is trusted not to lie about the timestamps of those logs. This makes the use of trusted timestamping solutions like RFC3161 timestamp servers redundant if the timestamps are requested by the already-trusted service components. However, we may be more interested in whether the users are faithfully representing the time at which they made an edit. An example scenario might be a user who is offline at the time they made an edit – the server-generated timestamp would then mark the time when they came back online and the clones re-synchronised. Let's assume that we insist the original edit time is the one recorded.
For an online user, RFC3161 servers (or alternatives such as blockchain-based timestamps) are a known solution; and the externally-verifiable timestamp could be incorporated into the signature in the normal way. First of all though, our user is offline. Furthermore, in a real-time synchronised system like one using m-ld, service latency or request throttling may preclude signing every message.
The ideal solution may be to have trusted timestamps be generated on some sensible schedule while the user is online, so that offline or high-frequency timestamps can at least be trusted to be within some verifiable bounds.
authority over principals
Once a user on a device has obtained write access to a timesheet domain, they are able to edit any data in that domain – including the records of other users and their public keys. This is a vulnerability in our prototype (although not as bad as it sounds, because only the owners of a timesheet have write access in the first place). However, we already know the solution – to add write access controls on the data, per the Integrity design. We will do so in the Verification milestone, in the context of testing the security of our overall solution (see next steps).
key revocation
As mentioned, timeld manages its active users and devices primarily by allocating them Ably keys. This means that addition or revocation of credentials involves three distributed data locations: user clones, the gateway, and Ably. It will be critical in production apps to be able to reliably revoke credentials. In the security project we have a way to coordinate changes to credentials between clones (thus, between user devices and the gateway) by use of agreements; in the Verification milestone we will implement this. The existence of the third location, Ably, would complicate this if timeld were fully decentralised because it isn't running m-ld; for our purposes, though, it will suffice to centralise the revocation of credentials on the gateway.
audit log integrity
As noted in the design, audit log integrity strictly requires that only one clone pushes log entries to the log system – not just at a time, but ever, because switching from one clone to another can skip or re-order operations (although in practice this should be a tiny risk, especially for an application with a low frequency of updates like timeld).
At present timeld's gateway is deployed with a single server; however it's important for us to demonstrate that we can scale out to multiple servers. Doing so would require that only one of those servers ever pushes audit logs. In the deployment platform it is straightforward to do this, by manually telling one region that it is in charge of audit logging – the platform will automatically restart a deployment if it fails, and m-ld's normal rev-up process will kick in to recover missing operations, ensuring continuity of the audit logs. An alternative approach which may be appropriate in other apps is to add a leader election process among the candidate auditing clones.
guaranteed log delivery
The integrity of the audit log does not rest solely on the reliability of the auditing clone (the gateway); it also requires that the audit system itself is available. Logz.io's NodeJS integration library incorporates a simple, practical resilience approach, involving batching and retries. However, this is not resilient to a full process or node crash of the gateway, as the in-memory batches will be lost.
Solving this fully requires a way to persist the outbound batches, for example using a local database and an outbox queue. This is a common problem with well-characterised solutions.
next steps
This prototype has provided a first step towards verification of the security solutions proposed in this project, by integrating them into a live app. The following milestones will further pursue Verification of security properties from two angles: formal analysis and testing. The latter will expand on the timeld integration to include write access controls, and add system tests.