Skip to content

Latest commit

 

History

History
145 lines (107 loc) · 5.13 KB

storage-ceph.md

File metadata and controls

145 lines (107 loc) · 5.13 KB

Rather than NFS, Rook offers nearly turn key storage options.

Multiple storage engines are offered. We are interested in Ceph, as it can provide block storage to pods via PersistentVolumeClaims.

Required are 3 or more hosts with SSDs, all part of the kube cluster. Ideally you'll want these kube hosts to have a label and taint so that the ceph resources only run on the storage nodes, and the storage nodes only run ceph.

In this example, I have a 3 node cluster, all of which are master control plane nodes. I'll be adding further compute nodes to the cluster later.

This is based on the official quickstart guide, with notes on what to change to work for my situation.

If you have used these disks for a Ceph cluster previously, follow this cleanup guide first

The cleanup guide can be roughly summarised as.... on every host in the kube cluster, rm -rf /var/lib/rook... and on every block device that has hosted Ceph data before, zero it entirely, either with dd or for SSDs blkdiscard.

git clone https://github.com/rook/rook.git
cd cluster/examples/kubernetes/ceph

Start by spinning up the operator. If you don't have any non-master nodes in your cluster, make this change to the operator.yaml file and then apply.


# for the deployment at the bottom of operator.yaml, set spec.template.spec.tolerations to:
# [{"key": "node-role.kubernetes.io/master", "effect": "NoSchedule", "operator": "Exists"}]
#
# Search the file for `CSI_PROVISIONER_TOLERATIONS` and uncomment, set it like this:
#  CSI_PROVISIONER_TOLERATIONS: |
#    [{"key": "node-role.kubernetes.io/master", "effect": "NoSchedule", "operator": "Exists"}]
# Do the same for CSI_PLUGIN_TOLERATIONS

kubectl create -f crds.yaml -f common.yaml -f operator.yaml

Once the operator pod is running successfully, you can customise the cluster.yaml file and apply. cluster.yaml contains a single CR.

# spec.placement:
#    all:
#       tolerations: [{"key": "node-role.kubernetes.io/control-plane", "effect": "NoSchedule", "operator": "Exists"}]

# storage.useAllNodes: false
# storage.useAllDevices: false

# storage.nodes:
# - name: kubemaster1
#   devices: [ {"name": "sdb" }]
# (there's lots of ways to configure which storage devices to use, this is the simplest)

kubectl create -f cluster.yaml

It can take some minutes for each of the successive pods to pull and execute. It may look like it's doing nothing for a while.

You should eventually see multiple pods each of rook-ceph-mon, rook-ceph-mgr, and rook-ceph-osd. Note that the osd-prepare pods are different. If the osd pods are not present, but the osd-prepare ones are, check the logs on the osd-prepare pods.

When it is finished, the ceph status command via the toolbox (kubectl apply -f toolbox.yaml) should show 3 daemons, a mgr, and 3 osds. The available should show the size of your total block devices allocated.

Ceph is now healthy.

Create the CephFilesystem resource. Modify the tolerations as appropriate.

kubectl apply -f filesystem.yaml

Blindly apply the storage class. It depends on the filesystem and the cephcluster resources already provisioned.

kubectl apply -f storage-ceph-storageclass.yaml

And hey presto, you should be able to provision storage:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pv-claim
  labels:
    app: wordpress
spec:
  storageClassName: rook-cephfs
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
EOF
kubectl get pv
# NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                    STORAGECLASS      REASON   AGE
# pvc-4a6ec218-6b3f-432b-b760-0fcdd0285ed3   20Gi       RWO            Delete           Bound    default/mysql-pv-claim   rook-ceph-block            57s

Problems I ran in to and how to debug them...

The first bunch of problems I had was because I hadn't set up my network correctly. Only monitor pods were running. I had neglected to allow ip forwarding on all of my nodes, so docker's default ip tables forward rule of drop was being applied.

I had re-created the cluster a few times. After deleting a cluster, you need to delete /var/lib/rook from each node (this path is set in spec.dataDirHostPath in cluster.yaml).

The status of the ceph cluster can be checked with:

kubectl  -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph status

It should show 3 mon daemons, at least 1 mgr, and 3 OSDs. I had 0 OSDs. This was because the disks I had assigned still have LVM configuration from a previous ceph cluster, and ceph was cowardly refusing to use them.

Example healthy output:

  cluster:
    id:     4de1433d-02e3-4c3a-843e-889d2deb8dc4
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 2m)
    mgr: a(active, since 64s)
    osd: 3 osds: 3 up (since 43s), 3 in (since 83s)

  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   15 MiB used, 335 GiB / 335 GiB avail
    pgs:     1 active+clean