Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation Rewrite #982

Merged
merged 72 commits into from
Sep 25, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
a2de08c
docs: create structure of docs overhaul
rfratto Sep 6, 2019
45bab6f
docs: add design docs back in
rfratto Sep 6, 2019
00619ef
docs: add community documentation
rfratto Sep 6, 2019
645762a
docs: add LogQL docs
rfratto Sep 6, 2019
7690e1d
docs: port existing operations documentation
rfratto Sep 9, 2019
d8ae8e8
docs: add new placeholder file for promtail configuration docs
rfratto Sep 9, 2019
0b3dff8
docs: add TOC for operations/storage
rfratto Sep 9, 2019
29c1df3
docs: add Loki API documentation
rfratto Sep 9, 2019
3a04846
docs: port troubleshooting document
rfratto Sep 9, 2019
ae76437
docs: add docker-driver documentation
rfratto Sep 10, 2019
3a3bebe
docs: link to configuration from main docker-driver document
rfratto Sep 10, 2019
dd4f217
docs: update API for new paths
rfratto Sep 10, 2019
67f104b
docs: fix broken links in api.md and remove json marker from examples
rfratto Sep 10, 2019
7720827
docs: incorporate api changes from #1009
rfratto Sep 11, 2019
3e76241
docs: port promtail documentation
rfratto Sep 12, 2019
01c4b72
docs: add TOC to promtail configuration reference
rfratto Sep 12, 2019
0e5a408
docs: fix promtail spelling errors
rfratto Sep 12, 2019
dfa8b6b
docs: add loki configuration reference
rfratto Sep 12, 2019
8b8d7cc
docs: add TOC to configuration
rfratto Sep 12, 2019
f0c9cf1
docs: add loki configuration example
rfratto Sep 13, 2019
938a679
docs: add Loki overview with brief explanation about each component
rfratto Sep 13, 2019
2a08cfa
docs: add comparisons document
rfratto Sep 13, 2019
0e0f526
docs: add info on table manager and update storage/README.md
rfratto Sep 13, 2019
564dcbd
docs: add getting started
rfratto Sep 16, 2019
0821f38
docs: incorporate config yaml changes from #755
rfratto Sep 16, 2019
398f4e1
docs: fix typo in releases url for promtail
rfratto Sep 17, 2019
d4f2e98
docs: add installation instructions
rfratto Sep 17, 2019
dd979d6
docs: add more configuration examples
rfratto Sep 17, 2019
de50e45
docs: add information on fluentd client
rfratto Sep 17, 2019
9cf01a4
docs: PR review feedback
rfratto Sep 17, 2019
7291b2c
docs: add architecture document
rfratto Sep 18, 2019
276bdae
docs: add missing information from old docs
rfratto Sep 18, 2019
f23fd99
`localy` typo
rfratto Sep 18, 2019
c12c929
docs: s/ran/run/g
rfratto Sep 18, 2019
0e6f789
Typo
pstibrany Sep 19, 2019
fe09d98
Typo
pstibrany Sep 19, 2019
e0015b1
Tyop
pstibrany Sep 19, 2019
6961bd9
Typo
pstibrany Sep 19, 2019
1c2d1c3
docs: fixed typo
pstibrany Sep 19, 2019
133796a
docs: PR feedback
rfratto Sep 19, 2019
c7b6b76
docs: @cyriltovena PR feedback
rfratto Sep 19, 2019
00357fe
docs: add more details to promtail url config option
rfratto Sep 20, 2019
b9c45bd
docs: expand promtail's pipelines document with extra detail
rfratto Sep 20, 2019
3fe341f
docs: fixed some spelling
pstibrany Sep 20, 2019
7a447f9
docs: remove reference to Stage interface in pipelines.md
rfratto Sep 20, 2019
012dc06
docs: clarify promtail configuration and scraping
rfratto Sep 20, 2019
5b9226f
docs: attempt #2 at explaining promtail's usage of machine hostname
rfratto Sep 20, 2019
5c0f523
docs: spelling fixes
pstibrany Sep 20, 2019
4034a3e
docs: add reference to promtail custom metrics and fix silly typo
rfratto Sep 20, 2019
bd4b360
docs: cognizant -> aware
rfratto Sep 20, 2019
dc893fa
docs: typo
pstibrany Sep 20, 2019
3bbaf35
docs: typos
pstibrany Sep 20, 2019
9d76dc1
docs: add which components expose which API endpoints in microservice…
rfratto Sep 20, 2019
e946c19
docs: change ksonnet installation to tanka
rfratto Sep 20, 2019
84d97dd
docs: address most @pracucci feedback
rfratto Sep 23, 2019
d3384bb
docs: fix all spelling errors so reviewers don't have to keep finding…
rfratto Sep 23, 2019
d0026d5
docs: incorporate changes to API endpoints made in #1022
rfratto Sep 23, 2019
48f75ef
docs: add missing loki metrics
rfratto Sep 23, 2019
87139c2
docs: add missing promtail metrics
rfratto Sep 23, 2019
fce9154
docs: @pstribrany feedback
rfratto Sep 24, 2019
a733358
docs: more @pracucci feedback
rfratto Sep 24, 2019
723dfde
docs: move metrics into a table
rfratto Sep 24, 2019
43168b9
docs: update push path references to /loki/api/v1/push
rfratto Sep 24, 2019
bb0eb6e
docs: add detail to further explain limitations of monolithic mode
rfratto Sep 24, 2019
b00c91d
docs: add alternative names to modes_of_operation diagram
rfratto Sep 24, 2019
7cc7bfd
docs: add log ordering requirement
rfratto Sep 24, 2019
d00243d
docs: add procedure for updating docs with latest version
rfratto Sep 24, 2019
65254ac
docs: separate out stages documentation into one document per stage
rfratto Sep 24, 2019
5852035
docs: list supported stores in storage documentation
rfratto Sep 25, 2019
13086c5
docs: add info on duplicate log lines in pipelines
rfratto Sep 25, 2019
5cc963e
docs: add line_format as key feature to fluentd
rfratto Sep 25, 2019
9c432f3
docs: hopefully final commit :)
rfratto Sep 25, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
docs: add architecture document
  • Loading branch information
rfratto committed Sep 25, 2019
commit 7291b2cd7531b5ddafc8b6d8c96b7bb00b4da43b
224 changes: 224 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -1 +1,225 @@
# Loki's Architecture
rfratto marked this conversation as resolved.
Show resolved Hide resolved

This document will expand on the information detailed in the [Loki
Overview](overview/README.md).

## Multi Tenancy

All data both in memory and in long-term storage is partitioned by a tenant ID,
rfratto marked this conversation as resolved.
Show resolved Hide resolved
pulled from the `X-Scope-OrgID` header from the request when Loki is ran in
rfratto marked this conversation as resolved.
Show resolved Hide resolved
multi-tenant mode. When Loki is **not** in multi-tenant mode, the header
is ignored and the tenant ID is set to "fake", which will appear in the index
and in stored chunks.

## Modes of Operation

![modes_diagram](modes_of_operation.png)
rfratto marked this conversation as resolved.
Show resolved Hide resolved

Loki has a set of components (defined below in [Components](#components)) which
are internally referred to as modules. Each component spawns a gRPC server for
internal traffic and an HTTP/1 server for external API requests. All components
come with an HTTP/1 server, but most only expose readiness and health endpoints.
rfratto marked this conversation as resolved.
Show resolved Hide resolved

While each component can be ran in a separate process, Loki also supports running
rfratto marked this conversation as resolved.
Show resolved Hide resolved
all components in a single process. Running all components in a single process is
referred to as "single process" mode, while the other mode is the "horizontally"
rfratto marked this conversation as resolved.
Show resolved Hide resolved
scalable mode.
rfratto marked this conversation as resolved.
Show resolved Hide resolved

When Loki runs in single process mode, individual components continue to
communicate to one another over gRPC using the gRPC listen port of the overall
process.

Single process mode is ideally suited for localy development, small workloads,
rfratto marked this conversation as resolved.
Show resolved Hide resolved
and for evaluation purposes. Single-process mode can be scaled with multiple
processes with the following limitations:

1. Local index and local storage cannot currently be used as they are not safe
for concurrent access by multiple processes.
2. Individual components cannot be scaled independently, so it is not possible
to have more read components than write components.

## Components

### Distributor

The **distributor** service is responsible for handling incoming streams by
clients. It's the first stop in the write path for log data. Once the
distributor receives a set of streams, they are split into batches and sent to
multiple [ingesters](#ingester) in parallel.
rfratto marked this conversation as resolved.
Show resolved Hide resolved

#### Hashing

Distributors use consistent hashing in conjunction with a configurable
replication factor to determine which instances of the ingester service should
recieve log data for a given stream.
rfratto marked this conversation as resolved.
Show resolved Hide resolved

A stream is a set of logs associated to a tenant and a unique labelset. The
stream is hashed using both the tenant ID and the labelset and then the hash is
used to find the ingesters to send the stream to.

A hash ring stored in [Consul](https://www.consul.io) is used to achieve
consistent hashing; all [ingesters](#ingester) register themselves into the hash
ring with a set of tokens they own. Each token is a random unsigned 64-bit
rfratto marked this conversation as resolved.
Show resolved Hide resolved
number. Along with a set of tokens, ingesters register their state into the
hash ring. The state JOINING, and ACTIVE may all receive write requests, while
ACTIVE and LEAVING ingesters may receieve read requests. When doing a hash
rfratto marked this conversation as resolved.
Show resolved Hide resolved
lookup, distributors only use tokens for ingesters who are in the appropriate
state for the request.

To do the hash lookup, distributors find the smallest appropriate token whose
value is larger than the hash of the stream. When the replication factor is
larger than 1, the next sucessing tokens (clockwise in the ring) that belong to
rfratto marked this conversation as resolved.
Show resolved Hide resolved
different ingesters will also be included in the result.

The effect of this hash set up is that each token that an ingester owns is
responsible for a range of hashes. If there are three tokens with values 0, 25,
and 50, then a hash of 3 would be given to the ingester that owns the token 25;
the ingester owning token 25 is responsible for the hash range of 1-25.

#### Quorum consistency

Since all distributors share access to the same hash ring, write requests can be
sent to any distributor.

To ensure consistent query results, Loki uses
[Dynamo-style](https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf)
quoum consistency on reads and writes. This means that the distributor will wait
rfratto marked this conversation as resolved.
Show resolved Hide resolved
for a positive response of at least one half plus one of the ingesters to send
the sample to before responding to the user.

### Ingester

The **ingester** service is responsible for writing log data to long-term
storage backends (DynamoDB, S3, Cassandra, etc.) on the write path and returning
log data for in-memory queries on the read path.

Ingesters contain a _lifecycler_ which manages the lifecycler of an ingester in
rfratto marked this conversation as resolved.
Show resolved Hide resolved
the hash ring. Each ingester has a state of either PENDING, JOINING, ACTIVE,
rfratto marked this conversation as resolved.
Show resolved Hide resolved
LEAVING, or UNHEALTHY:

1. PENDING is an Ingester's state when it is waiting for a handoff from another
ingester that is LEAVING.

2. JOINING is an Ingester's state when it is currently inserting its tokens into
the ring and initializing itself. It may receive write requests for tokens it
owns.

3. ACTIVE is an Ingester's state when it is fully initialized. It may receive
both write and read requests for tokens it owns.

4. LEAVING is an Ingester's state when it is shutting down. It may receive read
requests for data it still has in memory.

5. UNHEALTHY is an Ingester's state when it has failed to heartbeart to Consul.
UNHEALHTY is set by the distributor when it periodically checks the ring.

Each log stream that an ingester receives is built up into a set of many "chunks"
in memory and flushed to the backing storage backend at a configurable interval.

A new chunk is created in memory when:

1. The chunk has reached capacity (a configurable value)
rfratto marked this conversation as resolved.
Show resolved Hide resolved
2. Too much time has passed without the chunk being updated
rfratto marked this conversation as resolved.
Show resolved Hide resolved
3. A flush occurs.

If an ingester process crashes or exits abruptly, all the data that has not yet
been flushed will be lost. Loki is usually configured to replicate multiple
replicas (usually 3) of each log to mitigate this risk.

When a flush occurs to a backend store, the chunk is hashed based on its tenant,
rfratto marked this conversation as resolved.
Show resolved Hide resolved
labels, and contents. This means that multiple ingesters with the same copy of
data will not write the same data to the backing store twice, but if any write
failed to one of the replicas, multiple differing chunk objects will be created
in the backing store. See [Querier](#querier) for how data is deduplicated.

#### Handoff

By default, when an ingester is shutting down and tries to leave the hash ring,
rfratto marked this conversation as resolved.
Show resolved Hide resolved
it will wait to see if a new ingester tries to enter before flushing and will
try to initiate a handoff. The handoff will transfer all of the tokens and
in-memory chunks owned by the leaving ingester to the new ingester.

This process is used to avoid flushing all chunks when shutting down, which is a
slow process.

### Querier

The **querier** service handles the actual [LogQL](./logql.md) evaluation of
rfratto marked this conversation as resolved.
Show resolved Hide resolved
logs stored in long-term storage.

Queriers query all ingesters for in-memory data before falling back to
running the same query against the backend store. Queriers build an iterator
rfratto marked this conversation as resolved.
Show resolved Hide resolved
over the data from the appropriate sources and perform deduplication: if two log
rfratto marked this conversation as resolved.
Show resolved Hide resolved
lines have the exact same nanosecond timestamp and the exact same contents, the
duplicates will not be returned in the query result.

## Chunk Store

The **chunk store** is Loki's long-term data store, designed to support
interactive querying and sustained writing without the need for background
maintainence tasks. It consists of:

* An index for the chunks. This index can be backed by
rfratto marked this conversation as resolved.
Show resolved Hide resolved
[DynamoDB from Amazon Web Services](https://aws.amazon.com/dynamodb),
[Bigtable from Google Cloud Platform](https://cloud.google.com/bigtable), or
[Apache Cassandra](https://cassandra.apache.org).
* A key-value (KV) store for the chunk data itself, which can be DynamoDB,
Bigtable, Cassandra again, or an object store such as
rfratto marked this conversation as resolved.
Show resolved Hide resolved
[Amazon * S3](https://aws.amazon.com/s3)

> Unlike the other core components of Loki, the chunk store is not a separate
> service, job, or process, but rather a library embedded in the three services
rfratto marked this conversation as resolved.
Show resolved Hide resolved
> that need to access Loki data: the [ingester](#ingester) and [querier](#querier).

The chunk store relies on a unified interface to the
"[NoSQL](https://en.wikipedia.org/wiki/NoSQL)" stores (DynamoDB, Bigtable, and
Cassandra) that can be used to back the chunk store index. This interface
assumes that the index is a collection of entries keyed by:

* A **hash key**. This is required for *all* reads and writes.
* A **range key**. This is required for writes and can be omitted for reads,
which can be queried by prefix or range.

The interface works somewhat differently across the supported databases:

* DynamoDB supports range and hash keys natively. Index entries are thus
modelled directly as DynamoDB entries, with the hash key as the distribution
key and the range as the range key.
rfratto marked this conversation as resolved.
Show resolved Hide resolved
* For Bigtable and Cassandra, index entries are modelled as individual column
values. The hash key becomes the row key and the range key becomes the column
key.

A set of schemas are used to map the matchers and label sets used on reads and
writes to the chunk store into appropriate operations on the index. Schemas have
been added as Loki has evolved, mainly in an attempt to better load balance
writes and improve query performance.

> The current schema recommendation is the **v10 schema**.

## Read Path

To summarize, the read path does the following:
rfratto marked this conversation as resolved.
Show resolved Hide resolved

1. The querier receives an HTTP/1 request for data.
2. The querier passes the query to all ingesters for in-memory data and with a
rfratto marked this conversation as resolved.
Show resolved Hide resolved
lazily-loaded fallback of the backing store.
3. The ingesters receive the read request and return data matching the query, if
any.
4. The querier lazily loads data from the backing store and runs the query
against it if no ingesters returned data.
5. The querier iterates over all recieved data and deduplicates, returning a
final set of data over the HTTP/1 connection.

## Write Path

![chunk_diagram](chunks_diagram.png)
rfratto marked this conversation as resolved.
Show resolved Hide resolved

To summarize, the write path does the following:
rfratto marked this conversation as resolved.
Show resolved Hide resolved

1. The distributor receives an HTTP/1 request to store data for streams.
2. Each stream is hashed using the hash ring.
3. The distributor sends each stream to the appropriate ingesters and their
replicas (based on the configured replication factor).
4. Each ingester will create a chunk or append to an existing chunk for the
stream's data. A chunk is unique per tenant and per labelset.
5. The distributor responds with a success code over the HTTP/1 connection.
Binary file added docs/chunks_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/modes_of_operation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 4 additions & 2 deletions docs/overview/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ Unlike other logging systems, Loki is built around the idea of only indexing
labels for logs and leaving the original log message unindexed. This means
that Loki is cheaper to operate and can be orders of magnitude more efficent.

For a more detailed version of this same document, please read
[Architecture](../architecture.md).

## Multi Tenancy

Loki supports multi-tenancy so that data between tenants is completely
Expand Down Expand Up @@ -113,8 +116,7 @@ maintainence tasks. It consists of:

> Unlike the other core components of Loki, the chunk store is not a separate
> service, job, or process, but rather a library embedded in the three services
> that need to access Loki data: the [ingester](#ingester),
> [querier](#querier), and [ruler](#ruler).
> that need to access Loki data: the [ingester](#ingester) and [querier](#querier).
rfratto marked this conversation as resolved.
Show resolved Hide resolved

The chunk store relies on a unified interface to the
"[NoSQL](https://en.wikipedia.org/wiki/NoSQL)" stores (DynamoDB, Bigtable, and
Expand Down