Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edge #50

Merged
merged 27 commits into from
Jan 11, 2023
Merged

Edge #50

merged 27 commits into from
Jan 11, 2023

Conversation

gsvarovsky
Copy link
Member

@gsvarovsky gsvarovsky commented Aug 5, 2022

Signed journal entries prototype

m-ld/m-ld-security-spec#3 (part of the "Securing Shared Decentralised Live Information with m-ld" Project – 2021-02-035)

Signed journal entries. This is the ability to cryptographically bind the user identity and timestamp to write operations in the domain, such that their actions can be traced (and not repudiated).

abstract

User identity and signatures within the m-ld protocol have been explored in the Integrity prototype. This milestone has combined these signatures with the use of the new Journal API prototype into an application, to drive the production of an audit log according to the traceability design.

This prototype proceeded as expected with no changes required in the prototyped APIs. However, the work has served to further highlight important concerns that must be addressed in a secure production system.

moving parts

application

The chosen application to which we are applying audit traceability is timeld (this repository). It is a hybrid centralised-decentralised application, similar in its deployment and security model to those analysed in our threat modelling.

The other prototyping approach we considered was to construct system or compliance tests for m-ld, as we have done in previous milestones. This option was rejected primarily because there are no further changes proposed to m-ld itself in this milestone. Using a real application will also highlight issues in the design and the protocol prototype from a usefully different perspective.

Since timeld is a live online application, we chose to conduct this prototype on a new branch, edge (corresponding to the edge branch of m-ld, having the security prototype). This branch has been deployed online to timeld-edge.fly.dev. Note that to interact with this service (called a timeld "gateway"), you must use the timeld CLI version tagged as edge, i.e. npm install timeld-cli@edge -g (version correspondence checking in timeld is an open ticket).

The main change to timeld for this prototype is to introduce an asymmetric cryptographic key pair for each user/device, and also for the gateway itself (captured in the UserKey class). The key pair is used by the "app" (in both CLI and gateway) to sign and verify m-ld protocol messages relating to Timesheets, the main application data resource. This is encapsulated in the TimeldApp class, which is used in the initialisation of every local m-ld clone.

The public keys are stored in the m-ld domain (the distributed data set) itself, so that any app having a clone of the domain is able to verify signatures. Adding users and their public keys is done exclusively by the gateway (see new method writePrincipalToTimesheet).

The private keys are not stored in the shared Timesheet data, only locally in user devices and gateway secrets. They are also stored in the gateway's private domain, specifically for the purpose of signing imported timesheet data (although we did not complete this function, and imported data are currently signed with the gateway's own key).

The need for an asymmetric key pair in this app is complicated by its pre-existing use of Ably keys for security. These are keys provided by the messaging provider Ably, and corresponded with user accounts in timeld by the gateway. This mechanism was chosen because it allows the gateway to easily manage access to Ably channels, and thereby provide read access control to timesheet domains as a whole. However, Ably keys do not have a public component suitable for offline signature verification.

The solution arrived at is to generate the asymmetric key pairs for users during the device registration step. Each private key is encrypted using the Ably key so that they can only be used for signing by components that also have access to the Ably key. The details of this can be seen in the changes to the CLI registration sequence diagram.

auditing

The chosen system to which audit log entries are persisted in this prototype is the online logging service Logz.io.

This is a system with which we already have familiarity, and a community account; it is based on the commonly-used open-source ELK (Elasticsearch, Logstash, and Kibana) software stack; it has a built-in user interface with the ability to securely share logs; and has an easy-to-use NodeJS library for integration.

We also considered using the permissioned ledger Iroha, as used in a previous milestone as a consensus mechanism. However it does not have a suitable user interface for sharing logs, and would need to be deployed and operated by us online. Since timeld has a central trusted component, the gateway, Iroha's ability to operate fully decentralised does not weigh in its favour here.

The actual integration of Logz.io logging in the gateway simply ships each timesheet update, as seen by the gateway, to the online service via the integration library, augmented with the timestamp and identity of the gateway and timesheet, for filtering during audit.

Filtered logs can be made available via the Logz.io user interface (currently using the m-ld.io account; but this could be automated using the Logz API and so made available to any timeld account holder).

As an example, here is today's timeld session with the m-ld-security timesheet, in which I corrected yesterday's "PR finalisation task" and added today's "add example to PR" task:
image

Here are the audit log entries for that timesheet. Note that:

  • The format is an Elasticsearch query result. Each audit entry is found in /hits/hits[*]/_source
  • Log retention in our community account is only two days
m-ld-security.audit.json
{
  "took": 69,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": null,
    "hits": [
      {
        "_index": "logzioCustomerIndex220811_v2",
        "_type": "doc",
        "_id": "zIH7i4IBpWnvXYf5C1Z6.account-131907",
        "_version": 1,
        "_score": null,
        "_source": {
          "@timestamp": "2022-08-11T08:17:35.568Z",
          "name": "m-ld-security",
          "update": {
            "@principal": {
              "@id": "http://timeld-edge.fly.dev/gsvarovsky"
            },
            "@insert": [
              {
                "activity": "add example to PR",
                "@type": "Entry",
                "session": {
                  "@id": "cl6op75ja0000ald7bghwdfqa"
                },
                "start": {
                  "@value": "2022-08-11T08:17:35.421Z",
                  "@type": "http://www.w3.org/2001/XMLSchema#dateTime"
                },
                "vf:provider": {
                  "@id": "http://timeld-edge.fly.dev/gsvarovsky"
                },
                "@id": "cl6op75ja0000ald7bghwdfqa/1"
              },
              {
                "@type": "Session",
                "start": {
                  "@value": "2022-08-11T07:07:32.548Z",
                  "@type": "http://www.w3.org/2001/XMLSchema#dateTime"
                },
                "@id": "cl6op75ja0000ald7bghwdfqa"
              }
            ],
            "@ticks": 26,
            "@delete": []
          },
          "type": "timesheet",
          "gateway": "timeld-edge.fly.dev",
          "account": "gsvarovsky",
          "tags": [
            "_logz_http_bulk_json_8070"
          ]
        },
        "fields": {
          "@timestamp": [
            "2022-08-11T08:17:35.568Z"
          ]
        },
        "sort": [
          1660205855568
        ]
      },
      {
        "_index": "logzioCustomerIndex220811_v2",
        "_type": "doc",
        "_id": "xPK7i4IBpTzgyn1DjVoB.account-131907",
        "_version": 1,
        "_score": null,
        "_source": {
          "@timestamp": "2022-08-11T07:08:11.581Z",
          "name": "m-ld-security",
          "update": {
            "@principal": {
              "@id": "http://timeld-edge.fly.dev/gsvarovsky"
            },
            "@insert": [
              {
                "duration": 60,
                "activity": "PR finalisation",
                "@type": "Entry",
                "session": {
                  "@id": "cl6ncyn48000016d74y8ygck5"
                },
                "start": {
                  "@value": "2022-08-10T08:00:00.000Z",
                  "@type": "http://www.w3.org/2001/XMLSchema#dateTime"
                },
                "vf:provider": {
                  "@id": "http://timeld-edge.fly.dev/gsvarovsky"
                },
                "@id": "cl6ncyn48000016d74y8ygck5/1"
              }
            ],
            "@ticks": 24,
            "@delete": []
          },
          "type": "timesheet",
          "gateway": "timeld-edge.fly.dev",
          "account": "gsvarovsky",
          "tags": [
            "_logz_http_bulk_json_8070"
          ]
        },
        "fields": {
          "@timestamp": [
            "2022-08-11T07:08:11.581Z"
          ]
        },
        "sort": [
          1660201691581
        ]
      },
      {
        "_index": "logzioCustomerIndex220811_v2",
        "_type": "doc",
        "_id": "vFm6i4IBdh3PXF8m8CTD.account-131907",
        "_version": 1,
        "_score": null,
        "_source": {
          "@timestamp": "2022-08-11T07:07:32.402Z",
          "name": "m-ld-security",
          "update": {
            "@principal": {
              "@id": "http://timeld-edge.fly.dev/"
            },
            "@insert": [
              {
                "@type": "Account",
                "@id": "http://timeld-edge.fly.dev/gsvarovsky",
                "key": {
                  "@id": ".m4b4iQ"
                }
              },
              {
                "public": {
                  "@value": "MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvedJMx1ZsGWNaxZALB05sjikKUXqNJ5ZktYRroRjS4n8Hvll39oigx016qcd+3C2usy8HKZunkmTQh7TNMgXcnBuBOwCxvwcvw7AJwmXL5xcqEqIlnBwP1f0igcm7oklbjst+oeaewayL9PAOzKtpOfPEcvXsLP0dzp7uK1TUxwIDAQAB",
                  "@type": "http://www.w3.org/2001/XMLSchema#base64Binary"
                },
                "@type": "UserKey",
                "@id": ".m4b4iQ"
              }
            ],
            "@ticks": 23,
            "@delete": []
          },
          "type": "timesheet",
          "gateway": "timeld-edge.fly.dev",
          "account": "gsvarovsky",
          "tags": [
            "_logz_http_bulk_json_8070"
          ]
        },
        "fields": {
          "@timestamp": [
            "2022-08-11T07:07:32.402Z"
          ]
        },
        "sort": [
          1660201652402
        ]
      },
      {
        "_index": "logzioCustomerIndex220810_v2",
        "_type": "doc",
        "_id": "EuznhoIB4hZzgGntfMqs.account-131907",
        "_version": 1,
        "_score": null,
        "_source": {
          "@timestamp": "2022-08-10T08:38:04.244Z",
          "name": "m-ld-security",
          "update": {
            "@principal": {
              "@id": "http://timeld-edge.fly.dev/gsvarovsky"
            },
            "@insert": [
              {
                "activity": "PR finalisation",
                "@type": "Entry",
                "session": {
                  "@id": "cl6ncyn48000016d74y8ygck5"
                },
                "start": {
                  "@value": "2022-08-10T08:00:00.000Z",
                  "@type": "http://www.w3.org/2001/XMLSchema#dateTime"
                },
                "vf:provider": {
                  "@id": "http://timeld-edge.fly.dev/gsvarovsky"
                },
                "@id": "cl6ncyn48000016d74y8ygck5/1"
              },
              {
                "@type": "Session",
                "start": {
                  "@value": "2022-08-10T08:37:16.372Z",
                  "@type": "http://www.w3.org/2001/XMLSchema#dateTime"
                },
                "@id": "cl6ncyn48000016d74y8ygck5"
              }
            ],
            "@ticks": 21,
            "@delete": []
          },
          "type": "timesheet",
          "gateway": "timeld-edge.fly.dev",
          "account": "gsvarovsky",
          "tags": [
            "_logz_http_bulk_json_8070"
          ]
        },
        "fields": {
          "@timestamp": [
            "2022-08-10T08:38:04.244Z"
          ]
        },
        "sort": [
          1660120684244
        ]
      },
      {
        "_index": "logzioCustomerIndex220810_v2",
        "_type": "doc",
        "_id": "brXmhoIB3Nf0eLGWuadr.account-131907",
        "_version": 1,
        "_score": null,
        "_source": {
          "@timestamp": "2022-08-10T08:37:13.757Z",
          "name": "m-ld-security",
          "update": {
            "@principal": {
              "@id": "http://timeld-edge.fly.dev/"
            },
            "@insert": [
              {
                "@type": "Account",
                "@id": "http://timeld-edge.fly.dev/gsvarovsky",
                "key": {
                  "@id": ".m4b4iQ"
                }
              },
              {
                "public": {
                  "@value": "MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvedJMx1ZsGWNaxZALB05sjikKUXqNJ5ZktYRroRjS4n8Hvll39oigx016qcd+3C2usy8HKZunkmTQh7TNMgXcnBuBOwCxvwcvw7AJwmXL5xcqEqIlnBwP1f0igcm7oklbjst+oeaewayL9PAOzKtpOfPEcvXsLP0dzp7uK1TUxwIDAQAB",
                  "@type": "http://www.w3.org/2001/XMLSchema#base64Binary"
                },
                "@type": "UserKey",
                "@id": ".m4b4iQ"
              }
            ],
            "@ticks": 20,
            "@delete": []
          },
          "type": "timesheet",
          "gateway": "timeld-edge.fly.dev",
          "account": "gsvarovsky",
          "tags": [
            "_logz_http_bulk_json_8070"
          ]
        },
        "fields": {
          "@timestamp": [
            "2022-08-10T08:37:13.757Z"
          ]
        },
        "sort": [
          1660120633757
        ]
      }
    ]
  },
  "aggregations": {
    "2": {
      "buckets": [
        {
          "key_as_string": "2022-08-10T00:00:00.000+01:00",
          "key": 1660086000000,
          "doc_count": 2
        },
        {
          "key_as_string": "2022-08-11T00:00:00.000+01:00",
          "key": 1660172400000,
          "doc_count": 3
        }
      ]
    }
  },
  "status": 200
}

Here is the live filtered view in the Logz.io user interface (this may be empty if I have not done anything on the project in the last two days!):

Project timesheet audit entries

analysis

The following is a preliminary list of production-system concerns highlighted during the construction of this prototype; these will be folded into the next steps, and related to other research findings in the project write-up.

audit entry signatures

An app typically has no justification for re-verifying the signatures provided from its local clone via the API, because the clone is a component of the app – if there is malware in the clone implementation, the app itself may just as well be malware. Further, as noted in the design, if the clone/app is deployed as a service in a trusted environment (as it is in this prototype, and in our threat-modelled systems), then it must necessarily also be trusted to produce accurate logs.

In some systems, it may be required to provide externally-verifiable signatures. This is not just a case of exposing the signatures themselves (as would be trivial to do even in this prototype, since they are available to the API), but there must also be an externally-verifiable binding of user identities to public keys. This is the problem that Public Key Infrastructure (PKI) solves, and is more recently addressed in self-sovereign identity systems like Decentralized Identifiers (DIDs). Our solution is compatible with these by simple virtue of exposing the binary signatures, with minimal requirements on what they contain – simply, that they are verifiable.

timestamp trust

Similarly, if the online service is trusted to produce accurate logs then it is trusted not to lie about the timestamps of those logs. This makes the use of trusted timestamping solutions like RFC3161 timestamp servers redundant if the timestamps are requested by the already-trusted service components. However, we may be more interested in whether the users are faithfully representing the time at which they made an edit. An example scenario might be a user who is offline at the time they made an edit – the server-generated timestamp would then mark the time when they came back online and the clones re-synchronised. Let's assume that we insist the original edit time is the one recorded.

For an online user, RFC3161 servers (or alternatives such as blockchain-based timestamps) are a known solution; and the externally-verifiable timestamp could be incorporated into the signature in the normal way. First of all though, our user is offline. Furthermore, in a real-time synchronised system like one using m-ld, service latency or request throttling may preclude signing every message.

The ideal solution may be to have trusted timestamps be generated on some sensible schedule while the user is online, so that offline or high-frequency timestamps can at least be trusted to be within some verifiable bounds.

authority over principals

Once a user on a device has obtained write access to a timesheet domain, they are able to edit any data in that domain – including the records of other users and their public keys. This is a vulnerability in our prototype (although not as bad as it sounds, because only the owners of a timesheet have write access in the first place). However, we already know the solution – to add write access controls on the data, per the Integrity design. We will do so in the Verification milestone, in the context of testing the security of our overall solution (see next steps).

key revocation

As mentioned, timeld manages its active users and devices primarily by allocating them Ably keys. This means that addition or revocation of credentials involves three distributed data locations: user clones, the gateway, and Ably. It will be critical in production apps to be able to reliably revoke credentials. In the security project we have a way to coordinate changes to credentials between clones (thus, between user devices and the gateway) by use of agreements; in the Verification milestone we will implement this. The existence of the third location, Ably, would complicate this if timeld were fully decentralised because it isn't running m-ld; for our purposes, though, it will suffice to centralise the revocation of credentials on the gateway.

audit log integrity

As noted in the design, audit log integrity strictly requires that only one clone pushes log entries to the log system – not just at a time, but ever, because switching from one clone to another can skip or re-order operations (although in practice this should be a tiny risk, especially for an application with a low frequency of updates like timeld).

At present timeld's gateway is deployed with a single server; however it's important for us to demonstrate that we can scale out to multiple servers. Doing so would require that only one of those servers ever pushes audit logs. In the deployment platform it is straightforward to do this, by manually telling one region that it is in charge of audit logging – the platform will automatically restart a deployment if it fails, and m-ld's normal rev-up process will kick in to recover missing operations, ensuring continuity of the audit logs. An alternative approach which may be appropriate in other apps is to add a leader election process among the candidate auditing clones.

guaranteed log delivery

The integrity of the audit log does not rest solely on the reliability of the auditing clone (the gateway); it also requires that the audit system itself is available. Logz.io's NodeJS integration library incorporates a simple, practical resilience approach, involving batching and retries. However, this is not resilient to a full process or node crash of the gateway, as the in-memory batches will be lost.

Solving this fully requires a way to persist the outbound batches, for example using a local database and an outbox queue. This is a common problem with well-characterised solutions.

next steps

This prototype has provided a first step towards verification of the security solutions proposed in this project, by integrating them into a live app. The following milestones will further pursue Verification of security properties from two angles: formal analysis and testing. The latter will expand on the timeld integration to include write access controls, and add system tests.

@mcalligator
Copy link

mcalligator commented Aug 9, 2022

Great bit of work; very comprehensive, covering a lot of bases. Having reviewed, I have 3 observations and 1 question:

  1. Observation (and suggestion): mention of which branch of timeld to install, with reference to the actual npm package, might be too much detail for this PR. Instead, I'd suggest updating the timeld documentation, and including just a reference to that.
  2. Observation (also with suggestion): the new methodwritePrincipalToTimesheet might be better named addPrincipalToTimesheet.
  3. I appreciate you using data in the plural :-)
  4. In the authority over principals section, a worrying potential vulnerability is mentioned, with the mitigation that only timesheet owners have write access to timesheets. But how robust is the protection of timesheets from having their ownership changed?

@gsvarovsky
Copy link
Member Author

gsvarovsky commented Aug 10, 2022

Thanks for comments @Angus-McAllister

  1. Observation (and suggestion): mention of which branch of timeld to install, with reference to the actual npm package, might be too much detail for this PR. Instead, I'd suggest updating the timeld documentation, and including just a reference to that.

Sorry, I wasn't clear about this PR's purpose. It's not meant to be merged yet (it's a draft), as I intend to get back to working on main for the timesheets project, without this additional complexity. For now this PR is the documentation of this branch, so, I'd prefer to keep the detail. We'll update the main docs thoroughly when we do the merge, in the verification milestone.

  1. In the authority over principals section, a worrying potential vulnerability is mentioned, with the mitigation that only timesheet owners have write access to timesheets. But how robust is the protection of timesheets from having their ownership changed?

The vulnerability definitely exists in this prototype. Timesheet owners having exclusive access just affects the risk likelihood.

Timesheet ownership is centrally managed in timeld, and requires access to the gateway's private domain and the Ably control API. This aspect of security is therefore implemented in a conventional way. Modulo bugs, it should be comparable to other cloud apps – but note of course that timeld should not be treated as a production app (yet!).

Copy link

@mcalligator mcalligator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scope of review restricted to narrative describing work conducted; see previous comment and responses for details.

# Conflicts:
#	architecture/security/accounts.class.puml
#	architecture/security/img/register-cli.seq.svg
#	architecture/security/register-cli.seq.puml
#	package-lock.json
#	package.json
#	packages/cli/lib/AdminSession.mjs
#	packages/cli/lib/DomainConfigurator.mjs
#	packages/cli/lib/GatewayClient.mjs
#	packages/cli/package.json
#	packages/cli/test/GatewayClient.test.mjs
#	packages/common/index.mjs
#	packages/common/lib/AccountOwnedId.mjs
#	packages/common/lib/clone.mjs
#	packages/common/lib/util.mjs
#	packages/common/package.json
#	packages/gateway/README.md
#	packages/gateway/deploy.sh
#	packages/gateway/lib/Account.mjs
#	packages/gateway/lib/Authorization.mjs
#	packages/gateway/lib/Gateway.mjs
#	packages/gateway/package.json
#	packages/gateway/rest/index.mjs
#	packages/gateway/secrets.mjs
#	packages/gateway/server.mjs
#	packages/gateway/test/Account.test.mjs
#	packages/gateway/test/Gateway.test.mjs
#	packages/gateway/test/rest.test.mjs
#	packages/mite/package.json
#	publish.sh
- Upgrade to m-ld@edge
- Secret keys no longer available to server
- Authorisation JWTs use asymmetric crypto
- Key generation utility genkey.mjs
AuditLogger converted to generic HTTP log shipping
@gsvarovsky gsvarovsky mentioned this pull request Nov 25, 2022
Removed redundant mite package
NPM update
- Use version bundle from npm
- Update instructions for root keys
- Include docker network creation
@gsvarovsky
Copy link
Member Author

gsvarovsky commented Jan 10, 2023

@mcalligator please review the last commit, c92ff5a

  • I changed the build shell scripts to use the version bundle from npm, to save having to package up the local repo. I added a version build argument to support choosing the version.
  • I updated the instructions for the root keys, adding a new markdown file for the key details as this is shared between the docker and flyio instructions.
  • Include docker network creation section in the docker build instructions.

@gsvarovsky gsvarovsky marked this pull request as ready for review January 10, 2023 09:15
Copy link

@mcalligator mcalligator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! As per review call, let's exclude .run/timeld-gateway.run.xml from the commit.

# Conflicts:
#	package-lock.json
#	package.json
#	packages/caldav/package.json
#	packages/cli/package.json
#	packages/common/package.json
#	packages/gateway/package.json
#	packages/mite/package.json
#	packages/prejournal/package.json
#	packages/tiki/package.json
@gsvarovsky gsvarovsky merged commit 1bae731 into main Jan 11, 2023
@gsvarovsky gsvarovsky deleted the edge branch January 11, 2023 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants