Skip to content

Commit

Permalink
Update jiva.md
Browse files Browse the repository at this point in the history
Replaced new jiva concept with old one

Signed-off-by: ranjithwingrider <ranjith.raveendran@mayadata.io>
  • Loading branch information
ranjithwingrider committed Jul 19, 2019
1 parent ebf8c52 commit 60e7968
Showing 1 changed file with 18 additions and 8 deletions.
26 changes: 18 additions & 8 deletions docs/jiva.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,33 +10,43 @@ sidebar_label: Jiva

### Jiva

Each Jiva Volume comprises of a Controller (or Target) and a set of Replicas. Both Controller and Replica functionalities are provided by the same binary and hence the same [docker image](https://hub.docker.com/r/openebs/jiva/). Jiva simulates a block device which is exposed via an iSCSI target implementation(gotgt - part of the Controller). This block device is discovered and mounted remotely on the host where application pod is running. The Jiva Controller parallelly replicates the incoming IOs to its replicas. The Replica, in turn, writes these IOs to a sparse file.
Jiva has a single container image for both controller and replica. Docker image is available at https://hub.docker.com/r/openebs/jiva/ . Each Jiva Volume comprises of a Jiva Controller (or Target) and a set of Jiva Replicas. The Jiva controller synchronously replicates the incoming IO to the Jiva replicas. The replica considers a Linux sparse file as the foundation. It supports thin provisioning, snapshotting of storage volumes.

![Jiva storage engine of OpenEBS](/docs/assets/jiva.png)

#### Jiva Sparse File Layout

The following content is modified with some architectural change as compared to Rancher's LongHorn [documentation](https://rancher.com/microservices-block-storage/).
The following content is directly taken from Rancher's LongHorn [announcement documentation](https://rancher.com/microservices-block-storage/).

**Replica Operations of Jiva**

------

Jiva replicas are built using Linux sparse files, which support thin provisioning. Jiva does not maintain additional metadata to indicate which blocks are used. The block size is 4K. When a replica gets added at the controller, it creates an auto-generated snapshot(differencing disk). As the number of snapshots grows, the differencing disk chain could get quite long. To improve read performance, Jiva, therefore, maintains a read index table that records which differencing disk holds valid data for each 4K block. In the following figure, the volume has eight blocks. The read index table has eight entries and is filled up lazily as read operation takes place. A write operation writes the data on the latest file(head), deletes(fallocate) the corresponding block from the older snapshots(or differencing disks) and updates the index in the table, which now points to the live data.
Jiva replicas are built using Linux sparse files, which support thin provisioning. Jiva does not maintain additional metadata to indicate which blocks are used. The block size is 4K. When you take a snapshot, you create a differencing disk. As the number of snapshots grows, the differencing disk chain could get quite long. To improve read performance, Jiva, therefore, maintains a read index that records which differencing disk holds valid data for each 4K block. In the following figure, the volume has eight blocks. The read index has eight entries and is filled up lazily as read operation takes place. A write operation resets the read index, causing it to point to the live data. ![Longhorn read index](http://cdn.rancher.com/wp-content/uploads/2017/04/14095610/Longhorn-blog-3.png)

![Longhorn read index](/docs/assets/Longhorn-blog-new.png)

The read index table is kept in memory and consumes two bytes for each 4K block. A maximum of 512 auto-generated snapshots can be created for each volume. The read index table consumes a certain amount of in-memory space for each replica. A 1TB volume, for example, consumes 512MB of in-memory space.

The read index is kept in memory and consumes one byte for each 4K block. The byte-sized read index means you can take as many as 254 snapshots for each volume. The read index consumes a certain amount of in-memory data structure for each replica. A 1TB volume, for example, consumes 256MB of in-memory read index. We will potentially consider placing the read index in memory-mapped files in the future.

**Replica Rebuild**

------

The Jiva volume controller is responsible for initiating and coordinating the process of syncing the replicas. Once a replica comes up, it tries to get added to controller and controller will mark it as WO(Write Only). Then the replica initiates the rebuilding process from other healthy replicas. After the sync is completed, the volume controller sets the new replica to RW (read-write) mode.
When the controller detects failures in one of its replicas, it marks the replica as being in an error state. The Longhorn/Jiva volume manager is responsible for initiating and coordinating the process of rebuilding the faulty replica as follows:

- The Longhorn/Jiva volume manager creates a blank replica and calls the controller to add the blank replica into its replica set.
- To add the blank replica, the controller performs the following operations:
- Pauses all read and write operations.
- Adds the blank replica in WO (write-only) mode.
- Takes a snapshot of all existing replicas, which will now have a blank differencing disk at its head.
- Unpauses all read the write operations. Only write operations will be dispatched to the newly added replica.
- Starts a background process to sync all but the most recent differencing disk from a good replica to the blank replica.
- After the sync completes, all replicas now have consistent data, and the volume manager sets the new replica to RW (read-write) mode.
- The Longhorn/Jiva volume manager calls the controller to remove the faulty replica from its replica set.

It is not very efficient to rebuild replicas from scratch. We can improve rebuild performance by trying to reuse the sparse files left from the faulty replica.

When the controller detects failures in one of its replicas, it marks the replica as being in an error state and the rebuilding process is triggered.

**Note:** If REPLICATION_FACTOR is still met even after a replica is marked faulty, the controller will continue to serve R/W IOs. Else, it will wait for satisfying REPLICATION_FACTOR((n/2)+1; where n is the number of replicas).

<br>

Expand Down

0 comments on commit 60e7968

Please sign in to comment.