We're happy to announce the release of Lokomotive v0.9.0 (Indian Pacific).
- Update Kubernetes to v1.21.4 (#1567).
- Update
etcd
to v3.4.16 (#1493). - Update
calico
to v3.19.1 (#1521). - Replace Packet CCM with Cloud Provider Equinix Metal (#1545).
- Update
external-dns
to v0.8.0 (#1499). - Update
cert-manager
to v1.4.0 (#1501). - Update
dex
to v2.28.1 (#1503). - Update
velero
to v1.6.0 (#1505). - Update
prometheus-operator
charts to v0.48.1 (#1506). - Update
openebs-operator
to v2.10.0 (#1509). - Update
node-problem-detector
to v0.8.8 (#1507). - Update
rook
to v1.6.5 (#1495). - Update
contour
to v1.16.0 (#1508). - Update
linkerd
to v2.10.2 (#1522) - Update
cluster-autoscaler
to v1.21.0 (#1512). - Update
metallb
to v0.9.6 (#1555).
- Update Terraform providers to their latest versions (#1523).
- equinixmetal: Rename documentation, code and configuration from
Packet
toEquinix Metal
(#1545). - baremetal: Users can now configure node specific labels (#1405).
- rook-ceph: Add new parameter
resources
for resource requests and limits (#1483). - baremetal: Add new parameter
wipe_additional_disks
which allows to wipe any additional disks attached to the machine (#1486). - baremetal: Automated (re-)provisioning of worker nodes (#1502).
- Add new parameter
enable_node_local_dns
to enable node-local-dns support for clusters (#1524). - Add parameter
tolerations
for prometheus-operator and its components (#1540). - Define
MaxHistory
to clean up old Helm releases (#1549). - Add
cpu_manager_policy
flag to workers in Lokomotive clusters on Equinix Metal and AWS (#1406). - cli: Allow skipping the control plane updates, if cluster is not successfully configured using the flag
--skip-control-plane-update
(#1482).
- Use new label and taints syntax for
rook-ceph
(#1474). - Add information about restic parameter
require_volume_annotation
(#1539). - Rename
Packet
toEquinix Metal
(#1537).
- baremetal: Fix certificate rotation (#1478).
- baremetal: Configure and persist kernel args (#1489).
- Equinix Metal ARM: Use HTTP for
iPXE
URL (#1498) instead of HTTPS as it's unreliable with iPXE. - terraform: Fix ignored
ConditionPathExists
from[Service]
section to[Unit]
section (#1518). - cli: Honor
--upgrade-kubelets
option (#1516). - Fix pre-update health check potentially rolling back to older release of control plane component (#1515 & #1549)
- cli: Enable upgrade kubelets by default. Starting with v0.9.0 version the default value of
--upgrade-kubelets
flag is changed fromfalse
totrue
(#1517). - baremetal: Let
installer.service
retry on failure (#1490). - baremetal: Set hostname from
<cluster_name>-worker-<count_index>
tocontroller_names<count_index>
for controllers andworker_names<count_index>
for workers whenset_standard_hostname
is true (#1488). - pkg/terraform: Increase the default parallelism (#1481).
- cert-rotation: Print journal on error when restarting
etcd
(#1500). - Restart containers from systemd unit only, not from Docker daemon. This fixes possible race conditions while rotating certificates (#1511).
- Go module updates and cleanups (#1556).
Lokomotive cluster deployed on Equinix Metal needs cluster configuration change from packet
to equinixmetal
:
# old
cluster "packet" {
...
...
}
# new
cluster "equinixmetal" {
...
...
}
The variable k8s_domain_name
now takes only the domain name instead of the <cluster_name>.<k8s_domain_name>
.
Example:
# old
k8s_domain_name = "mercury.k8s.localdomain"
# new
k8s_domain_name = "k8s.localdomain"
Alertmanager and operator are now configured as a block.
# old
alertmanager_retention = "360h"
alertmanager_external_url = "https://api.example.com/alertmanager"
alertmanager_config = file("alertmanager-config.yaml")
alertmanager_node_selector = {
"kubernetes.io/hostname" = "worker3"
}
# new
alertmanager {
retention = "360h"
external_url = "https://api.example.com/alertmanager"
config = file("alertmanager-config.yaml")
node_selector = {
"kubernetes.io/hostname" = "worker3"
}
}
# old
prometheus_operator_node_selector = {
"kubernetes.io/hostname" = "worker3"
}
# new
operator {
node_selector = {
"kubernetes.io/hostname" = "worker3"
}
}
The baremetal platform now supports user data changes and reprovisioning of worker nodes based on user data changes.
From Lokomotive v0.9.0 onwards, additional files are created in the cluster assests directory. The filename being the MAC address of the machine and the contents being the domain name.
The following upgrade paths are supported:
In such a scenario, the only thing that needs to be done is the above mentioned change in k8s_domain_name
.
By default, user data changes are ignored.
In such a scenario, Lokomotive reboots the worker nodes and applies the user data changes. To bring about such a change:
- Make user data changes (if any).
- Set
ignore_worker_changes = false
.
In such a scenario, Lokomotive forces reinstallation of worker nodes via PXE and applies the user data
changes. This requires a meaningful pxe_commands
value configured for automation.
To bring about such a change:
- Make user data changes (if any).
- Remove the file with worker node MAC address from cluster assets directory.
- Set
ignore_worker_changes = false
in cluster configuration. - Set
pxe_commands
to appropriate value.
NOTE: Reprovisioning will reinstall the operating system. If you have any stateful workloads running, this step would result is data loss. Lokomotive does not taint or drain the worker nodes before reprovisioning, it's recommended to be done manually before initiating reprovisioning of the worker nodes.
NOTE: Updating multiple Lokomotive versions at a time is not supported. If your cluster is running a version older than
v0.8.0
, update tov0.8.0
first and only then proceed with the update tov0.9.0
.
Execute the following steps in your cluster configuration directory:
Download and install the lokoctl binary by following the v0.9.0 installation guide and verify the version:
lokoctl version
v0.9.0
-
Backup the Terraform state file:
cd $assets_dir/terraform terraform state pull > backup.state
-
Update Terraform provider from
packethost/packet
toequinix/metal
:terraform state replace-provider packethost/packet equinix/metal
-
Pull the latest state file (required only if using S3 backend):
terraform state pull > terraform.tfstate
-
Replace all references of
packet_
withmetal_
in the state file:sed -i 's/packet_/metal_/g' terraform.tfstate
-
Change the module name from
module.packet
tomodule.equinixmetal
in the state file:sed -i 's/module.packet/module.equinixmetal/g' terraform.tfstate
-
Push Terraform state (required only if using S3 backend):
terraform state push -force terraform.tfstate
-
Replace
packet
withequinixmetal
in the cluster configuration file. Execute this step in the cluster directory:# old cluster "packet" { ... } # new cluster "equinixmetal" { ... }
-
Uninstall Packet CCM as we are replacing it with Cloud Provider Equinix Metal.
helm uninstall packet-ccm --namespace kube-system
-
Upgrade to Lokomotive v0.9.0.
lokoctl cluster apply --skip-components --skip-pre-update-health-check
NOTE: Do not forget the
--skip-pre-update-health-check
flag.
-
Create new files in the assets directory for each controller and worker node. The file name should be the MAC address of the node and the contents of the file should be the domain name (i.e
controller_domains
andworker_domains
):# for each controller and worker nodes echo <DOMAIN_NAME> > $assets_dir/cluster-assets/<MAC_ADDRESS>
-
Change the value of
k8s_domain_name
to only include the domain name: Example:# old k8s_domain_name = mercury.example.com # new k8s_domain_name = "example.com"
-
Add a
pxe_commands
entry which lokoctl uses to automate the PXE (re)provisioning. For existing clusters you can usepxe_commands = "true"
to have no PXE automation (true
is the no-op bash shell command), and reprovisioning through PXE won't be supported for this cluster. -
Follow the steps mentioned in this section as per the desired upgrade path. Make the necessary configuration changes as mentioned. Finally execute:
lokoctl cluster apply --skip-components
Execute:
lokoctl cluster apply --skip-components
On all platforms except AKS, do the following:
- Download the release bundle:
curl -LO https://github.com/kinvolk/lokomotive/archive/v0.9.0.tar.gz
tar -xvzf v0.9.0.tar.gz
- Run the update script:
./lokomotive-0.9.0/scripts/update/0.8.0-0.9.0/update.sh
Update installed Lokomotive components:
lokoctl components apply
NOTE: Updating the MetalLB and Contour components would incur some downtime. Please update the components accordingly.
We're happy to announce the release of Lokomotive v0.8.0 (Hogwarts Express).
- Update AKS to
1.18.17
(#1466).
- Update
prometheus-operator
to0.46.0
(#1440). - Update
contour
tov1.13.1
(#1450). - Update
calico
tov3.18.1
(#1453).
- Update Terraform providers to their latest versions (#1451).
- Add a certificate rotation command:
lokoctl cluster certificate rotate
(#1435). - Add
reclaim_policy
field to componentsrook-ceph
,openebs-storage-class
andaws-ebs-csi-driver
. Change the default behaviour of the default storage class toRetain
fromDelete
. (#1369).
- Remove the
webhook
field from thecert-manager
component (#1413).
This is an optional step and only applies if you use any of these storage components: rook-ceph
, openebs-storage-class
or aws-ebs-csi-driver
.
If you are relying on the default values for the PersistentVolumes to have a reclaim policy as Delete
, then please add the following field explicitly now:
reclaim_policy = "Delete"
NOTE: Updating multiple Lokomotive versions at a time is not supported. If your cluster is running a version older than
v0.7.0
, update tov0.7.0
first and only then proceed with the update tov0.8.0
.
Execute the following steps in your cluster configuration directory:
- Download and install the lokoctl binary by following the
v0.8.0 installation guide
and verify the version using
lokoctl version
:
v0.8.0
- Download the release bundle:
curl -LO https://github.com/kinvolk/lokomotive/archive/v0.8.0.tar.gz
tar -xvzf v0.8.0.tar.gz
- On all platforms except AKS, update Calico CRDs:
kubectl apply -f ./lokomotive-0.8.0/assets/charts/control-plane/calico/crds/
- Update the control-plane:
lokoctl cluster apply -v
NOTE: If the update process gets interrupted, rerun the above command.
NOTE: If your cluster is running self-hosted kubelets, append
--upgrade-kubelets
to the above command.
NOTE: The command updates the cluster as well as any Lokomotive components applied to it. Append
--skip-components
to the above command to avoid updating the components. Components can then be updated individually usinglokoctl component apply
.
The update process typically takes about 10 minutes.
After the update, running lokoctl health
should result in an output similar to the following:
Node Ready Reason Message
lokomotive-controller-0 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-0 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-1 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-2 True KubeletReady kubelet is posting ready status
Name Status Message Error
etcd-0 True {"health":"true"}
Manually update the CRDs before updating the component contour
:
kubectl apply -f https://raw.githubusercontent.com/projectcontour/contour/release-1.13/examples/contour/01-crds.yaml
Update the component:
lokoctl component apply contour
Manually update the CRDs before updating the component prometheus-operator
:
kubectl apply -f ./lokomotive-0.8.0/assets/charts/components/prometheus-operator/crds/
Update the component:
lokoctl component apply prometheus-operator
We're happy to announce the release of Lokomotive v0.7.0 (Ghan).
- Update Kubernetes to v1.20.4 (#1410).
- Add component
node-problem-detector
(#1384).
-
AWS EBS CSI Driver: Add
node_affinity
andtolerations
(#1393). -
EM: Add worker pool specific
facility
attribute (#1359).
-
Use FLUO to update nodes (#1295).
-
How to add a worker pool in a different facility (#1361).
-
Refactor AWS quickstart guide (#1273).
-
EM: Add
Restart=on-failure
andRestartSec=5s
for the metadata service (#1362). -
Fix wrong etcd settings, clean up leftovers from etcd move from rkt to docker based daemon (#1382).
-
contour: Fix hostPort regression (#1342).
-
baremetal: Remove
enable_tls_bootstrap
attribute (#1380). -
Remove a deprecated cert-manager namespace label
certmanager.k8s.io/disable-validation=true
(#1372).
-
AWS EBS CSI Driver: Change the StorageClass' default ReclaimPolicy to
Retain
(#1393). -
Include a "v" in version strings when releasing (#1417).
Delete the enable_tls_bootstrap
parameter from your cluster configuration since it has been removed in this release.
NOTE: Updating multiple Lokomotive versions at a time is not supported. If your cluster is running a version older than
v0.6.1
, update tov0.6.1
first and only then proceed with the update tov0.7.0
.
Execute the following steps in your cluster configuration directory:
- Download and install the lokoctl binary by following the
v0.7.0 installation guide
and verify the version using
lokoctl version
:
v0.7.0
- Update the control plane:
lokoctl cluster apply -v
NOTE: If the update process gets interrupted, rerun the above command.
NOTE: If your cluster is running self-hosted kubelets, append
--upgrade-kubelets
to the above command.
NOTE: The command updates the cluster as well as any Lokomotive components applied to it. Append
--skip-components
to the above command to avoid updating the components. Components can then be updated individually usinglokoctl component apply
.
The update process typically takes about 10 minutes.
After the update, running lokoctl health
should result in an output similar to the following:
Node Ready Reason Message
lokomotive-controller-0 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-0 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-1 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-2 True KubeletReady kubelet is posting ready status
Name Status Message Error
etcd-0 True {"health":"true"}
On all platforms except AKS, do the following:
- Download the release bundle:
curl -LO https://github.com/kinvolk/lokomotive/archive/v0.7.0.tar.gz
tar -xvzf v0.7.0.tar.gz
- Run the update script:
./lokomotive-0.7.0/scripts/update/0.6.1-0.7.0/update.sh
This is a patch release which includes mainly bug fixes.
NOTE: Please read the updating guidelines here.
- Velero: Add tolerations to Restic plugin (#1348).
- Velero: Add e2e tests (#1353).
- Update all Go dependencies (#1358).
- Update Packet (Equinux Metal) Terraform provider to 3.2.1 that fixes the provisioning failures of
n2.xlarge.x86
machines (#1349).
- Prefix
ETCD_
for standard etcd environment variables only (#1308). - Update Restic TolerationSeconds type to integer and add conditional checks (#1365).
- Add missing
provider
parameter (#1354). - Update RELEASING document to add steps to update the documentation website entry (#1326).
- Improvements to the Lokomotive release process documentation (#1341).
NOTE: Updating multiple Lokomotive versions at a time is not supported. If your cluster is running a version older than
v0.6.0
, update tov0.6.0
first and only then proceed with the update tov0.6.1
.
Please perform the following manual steps in your cluster configuration directory.
-
Download and install the lokoctl binary by following the v0.6.1 installation guide.
lokoctl version v0.6.1
-
Update control plane.
lokoctl cluster apply --skip-components -v
NOTE: If the update process gets interrupted, rerun above command.
The update process typically takes about 10 minutes.
After the update, running lokoctl health
should result in an output similar to the following:
Node Ready Reason Message
lokomotive-controller-0 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-0 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-1 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-2 True KubeletReady kubelet is posting ready status
Name Status Message Error
etcd-0 True {"health":"true"}
- Download the release bundle.
curl -LO https://github.com/kinvolk/lokomotive/archive/v0.6.1.tar.gz
tar -xvzf v0.6.1.tar.gz
- Run update script
./lokomotive-0.6.1/scripts/update/0.6.0-0.6.1/update.sh
We're happy to announce the release of Lokomotive v0.6.0 (Flying Scotsman).
This release includes several new features, many component updates, and a new platform - Tinkerbell.
- Update Kubernetes to v1.19.4 and AKS to v1.18.10 (#1189).
- Update
external-dns
to v0.7.4 (#1115). - Update
metrics-server
to v2.11.2 (#1116). - Update
cluster-autoscaler
to version v1.1.0 (#1137). - Update
rook
to v1.4.6 (#1117). - Update
velero
to v1.5.2 (#1131). - Update
openebs-operator
to v2.2.0 (#1095). - Update
contour
to v1.10.0 (#1170). - Update
experimental-linkerd
to stable-2.9.0 (#1123). - Update
web-ui
to v0.1.3 (#1237). - Update
prometheus-operator
to v0.43.2 (#1162). - Update Calico to v3.17.0 (#1251).
- Update
aws-ebs-csi-driver
to v0.7.0 (#1135). - Update
etcd
to 3.4.14 (#1309).
- Update Terraform providers to their latest versions (#1133).
- Add support for Tinkerbell platform (#392).
- Add new worker pools when TLS bootstrap is enabled without remaining stuck in the installation phase (#1181).
contour
: Consistently apply node affinity and tolerations to all scheduled workloads (#1161).- Don't run control plane components as DaemonSets on single control plane node clusters (#1193).
- Add Packet CCM to Packet platform (#1155).
contour
: Parameterize Envoy scraping interval (#1229).- Expose
--conntrack-max-per-core
kube-proxy flag (#1187). - Add
require_volume_annotation
for restic plugin (#1132). - Print bootkube journal if cluster bootstrap fails (#1166). This makes cluster bootstrap problems easier to debug.
aws-ebs-csi-driver
: Add dynamic provisioning, resizing and snapshot options (#1277). Now the user has the ability to control the AWS EBS driver to enable or disable provisioning, resizing and snapshotting.
calico-host-protection
: Add custom locked down PSP configuration (#1274).
- Pull control plane images from Quay to avoid hitting Docker Hub pulling limits (#1226).
- Bootkube now waits for all control plane charts to converge before exiting, which should make the bootstrapping process more stable (#1085).
- Remove deprecated CoreOS mentions from AWS (#1245) and bare metal (#1246).
- Improve hardware reservations validation rules on Equinix Metal (#1186).
Removed the undocumented cluster.os_name
parameter, since Lokomotive supports Flatcar Container Linux only.
The cluster.os_channel
parameter got simplified by removing the flatcar-
prefix.
os_channel = "flatcar-stable"
os_channel = "stable"
Velero requires an explicit provider
field to select the provider.
Example:
component `velero` {
provider = "openebs"
openebs {
...
}
}
Due to a change in the upstream Helm chart, updating the Prometheus Operator component incurs down time. We do this before updating the cluster so no visibility is lost while the cluster update is happening.
- Patch the
PersistentVolume
created/used by theprometheus-operator
component toRetain
claim policy.
kubectl patch pv -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-prometheus-prometheus-operator-prometheus-0")].metadata.name}')
kubectl patch pv -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-alertmanager-prometheus-operator-alertmanager-0")].metadata.name}')
NOTE: To execute the above command, the user must have a cluster wide permission.
- Uninstall the
prometheus-operator
release and delete the existingPersistentVolumeClaim
, and verifyPersistentVolume
becomeReleased
.
lokoctl component delete prometheus-operator
kubectl delete pvc data-prometheus-prometheus-operator-prometheus-0 -n monitoring
kubectl delete pvc data-alertmanager-prometheus-operator-alertmanager-0 -n monitoring
- Remove current
spec.claimRef
values to change the PV's status from Released to Available.
kubectl patch pv --type json -p='[{"op": "remove", "path": "/spec/claimRef"}]' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-prometheus-prometheus-operator-prometheus-0")].metadata.name}')
kubectl patch pv --type json -p='[{"op": "remove", "path": "/spec/claimRef"}]' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-alertmanager-prometheus-operator-alertmanager-0")].metadata.name}')
NOTE: To execute the above command, the user must have a cluster wide permission.
-
Make sure that the
prometheus-operator
'sstorage_class
andprometheus.storage_size
are unchanged during the upgrade process. -
Proceed to a fresh
prometheus-operator
component installation. The new release should now re-attach your previously released PV with its content.
lokoctl component apply prometheus-operator
NOTE: Etcd dashboard will only start showing data after the cluster is updated.
- Delete the old kubelet service.
kubectl -n kube-system delete svc prometheus-operator-kubelet
- If monitoring was enabled for
rook
,contour
,metallb
components, make sure you update them as well after the cluster is updated.
NOTE: Updating multiple Lokomotive versions at a time is not supported. If your cluster is running a version older than
v0.5.0
, update tov0.5.0
first and only then proceed with the update tov0.6.0
.
Please perform the following manual steps in your cluster configuration directory.
- Download the release bundle.
curl -LO https://github.com/kinvolk/lokomotive/archive/v0.6.0.tar.gz
tar -xvzf v0.6.0.tar.gz
- Install the Packet CCM.
If you are running Lokomotive on Equinix Metal (formerly Packet), then install Packet CCM. Export your Packet cluster's project ID and API Key.
export PACKET_AUTH_TOKEN=""
export PACKET_PROJECT_ID=""
echo "apiKey: $PACKET_AUTH_TOKEN
projectID: $PACKET_PROJECT_ID" > /tmp/ccm-values.yaml
helm install packet-ccm --namespace kube-system --values=/tmp/ccm-values.yaml ./lokomotive-0.6.0/assets/charts/control-plane/packet-ccm/
- Update node config.
On Equinix Metal (formerly Packet), this script shipped with the release tarball will add permanent MetalLB labels and kubelet config to use CCM.
NOTE: Please edit this script to disable updating certain nodes. Modify the
update_other_nodes
function as required.
UPDATE_BOOTSTRAP_COMPONENTS=false
./lokomotive-0.6.0/scripts/update/0.5-0.6/update.sh $UPDATE_BOOTSTRAP_COMPONENTS
- If you're using the self-hosted kubelet, apply the
--cloud-provider
flag to it.
NOTE: If you're unsure you can run the command as it's harmless if you're not using the self-hosted kubelet.
kubectl -n kube-system get ds kubelet -o yaml | \
sed '/client-ca-file.*/a \ \ \ \ \ \ \ \ \ \ --cloud-provider=external \\' | \
kubectl apply -f -
- Export assets directory.
export ASSETS_DIR="assets"
- Remove BGP sessions from Terraform state.
If you are running Lokomotive on Equinix Metal (formerly Packet), then run the following commands:
cd $ASSETS_DIR/terraform
terraform state rm $(terraform state list | grep packet_bgp_session.bgp)
cd -
- Remove old asset files.
rm -rf $ASSETS_DIR/cluster-assets
rm -rf $ASSETS_DIR/terraform-modules
- Update control plane.
lokoctl cluster apply --skip-components -v
NOTE: If the update process gets interrupted, rerun above command.
NOTE: If you are running self-hosted kubelet then append the above command with flag
--upgrade-kubelets
.
The update process typically takes about 10 minutes.
After the update, running lokoctl health
should result in an output similar to the following:
Node Ready Reason Message
lokomotive-controller-0 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-0 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-1 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-2 True KubeletReady kubelet is posting ready status
Name Status Message Error
etcd-0 True {"health":"true"}
- Update the bootstrap components: kubelet and etcd.
This script shipped with the release tarball will update all the nodes to run the latest kubelet and etcd.
NOTE: Please edit this script to disable updating certain nodes. Modify
update_other_nodes
function as required.
UPDATE_BOOTSTRAP_COMPONENTS=true
./lokomotive-0.6.0/scripts/update/0.5-0.6/update.sh $UPDATE_BOOTSTRAP_COMPONENTS
- If you're using the self-hosted kubelet, reload its config.
NOTE: If you're unsure you can run the command as it's harmless if you're not using the self-hosted kubelet.
kubectl -n kube-system rollout restart ds kubelet
We've added log rotation to the Docker daemon running on cluster nodes. However, this only takes effect in new nodes. For this to apply to existing cluster nodes, you need to manually configure each node.
-
Drain the node.
This step ensures that you don't see any abrupt changes. Any workloads running on this node are evicted and scheduled to other nodes. The node is marked as unschedulable after running this command.
kubectl drain --ignore-daemonsets <node name>
-
SSH into the node and become root with
sudo -s
. -
Create the Docker config file:
echo ' { "live-restore": true, "log-opts": { "max-size": "100m", "max-file": "3" } } ' | tee /etc/docker/daemon.json
-
Restart the Docker daemon:
NOTE: This will restart all the containers on the node, including the kubelet. This step cannot be part of the automatic update script because restarting the Docker daemon will also kill the update script pod.
systemctl restart docker
-
Make the node schedulable:
kubectl uncordon <node name>
Manually update the CRDs before updating the component contour
:
kubectl apply -f https://raw.githubusercontent.com/kinvolk/lokomotive/v0.6.0/assets/charts/components/contour/crds/01-crds.yaml
Update the component:
lokoctl component apply contour
Manually update the CRDs before updating the component velero
:
kubectl apply -f ./lokomotive-0.6.0/assets/charts/components/velero/crds/
Update the component:
lokoctl component apply velero
Follow the OpenEBS update guide.
Follow the Rook Ceph update guide.
Other components are safe to update by running the following command:
lokoctl component apply <component name>
We're happy to announce the release of Lokomotive v0.5.0 (Eurostar).
This release packs new features, bug fixes, code optimizations, platform updates and security hardening.
- Update Kubernetes to
1.18.8
(#1071).
- Expose CNI MTU on the baremetal platform (#977).
- Component web-ui (#981), (#1100) from headlamp.
- Component inspektor-gadget (#1076) from inspektor-gadget.
- Update Velero component for Packet (OpenEBS and restic plugin support) (#881).
- istio-operator: Update to 1.7.3 (#1086).
- prometheus-operator: Update grafana, kube-state-metrics and node_exporter (#963).
- cert-manager: Update to 1.0.3 (#1114).
- Update to Terraform 0.13 (#824).
- Support in-cluster pod traffic encryption (#911).
- AWS, Packet, Baremetal: use Docker instead of rkt for host containers (#946).
- Change labels and taints format from string to structured (#1042).
- prometheus-operator: Add external_url (#964).
- Concepts: add document for admission webhook (#943).
- Coding style guide (#953).
- MetalLB: Clarify address_pools knob (#996).
- How to guide on backing up and restoring rook-ceph volumes with Velero (#1048).
- bootkube: feed output using local rather than local_file content (#1021).
- Dex: fix pod reload on config change (#1040).
- MetalLB: Add missing autodiscovery labels (#990).
- Gangway: add a ServiceAccount (#1104).
- If there is more than one component installed in single namespace,
lokoctl
will now refuse to remove then namespace while runninglokoctl component --delete
with--delete-namespace
flag (#1093).
- Fix error capitalization (#979).
- pkg/terraform: unexport functions not used outside of package (#984).
- pkg/components: remove unused List() function (#982).
- docs/rook-ceph-storage: Use correct apply command (#1026).
- pkg/asssets/assets_generate: Fix copyright (#1020).
- Cleanup Terraform providers before Terraform 0.13 upgrades (#860).
- kubelet e2e: Enable the disruptive test (#1012).
- .golangci.yml: Re-enable linters (#1029).
- Fix scripts/find-updates.sh (#1034), (#1068), (#1080).
- pkg/terraform: improvements (#1027).
- cli/cmd: cleanups part 1 (#1013).
- test/components/kubernetes: remove kubelet pod when testing node labels (#1052).
- Remove usage of template_file (#1046).
- test: de-duplicate value timeout and retryInterval (#1049).
- Packet: Read BGP peer address from metadata service (#1010).
- pkg/assets: cleanup exported API (#936).
- Cobra updated to v1.1.1 (#1082), (#1091).
- cli/cmd: cleanups part 2 (#1015).
- Add github actions (#1074).
- Makefile: use latest Go when building in Docker (#1083).
- cli/cmd: cleanups part 3 (#1018).
- Add new CI config for Packet based FLUO testing (#1110).
There have been some minor changes to the configurations of worker nodes.
The data type of labels
and taints
has been changed from string
to map(string)
for the AWS and Packet platforms.
labels = "testing=true"
taints = "nodeType=storage:NoSchedule"
labels = {
"testing" = "true"
}
taints = {
"nodeType" = "storage:NoSchedule"
}
This release also changes the default cluster.oidc.client_id
value from gangway
to clusterauth
.
This setting must match gangway.client_id
and dex.static_client.id
.
If you use default settings for oidc you'll need to add client_id = "gangway"
or change the static_client.id
and client_id
parameters for dex and gangway to clusterauth
respectively.
packet {
oidc {
client_id = "gangway"
}
}
packet {
oidc {
client_id = "clusterauth"
}
}
Ensure your cluster is in a healthy state by running lokoctl cluster apply
using the v0.4.1
version.
Updating multiple versions at a time is not supported so, if your cluster is older, update to v0.4.1
and only then proceed with the update to v0.5.0
.
Due to Terraform and Kubernetes updates to v0.13+ and v1.19.3 respectively.
Some manual steps need to be performed when updating. In your cluster configuration directory, follow these steps:
-
Update local Terraform binary to version v0.13.X. You can follow this guide to do that.
-
Starting from your cluster directory, export your platform name and assets directory name used in your platform configuration. It will be used in next steps:
export PLATFORM="packet" && export ASSETS_DIR="assets"
- Remove old asset files:
rm -f $ASSETS_DIR/terraform-modules/$PLATFORM/flatcar-linux/kubernetes/require.tf \
$ASSETS_DIR/terraform-modules/$PLATFORM/flatcar-linux/kubernetes/workers/require.tf \
$ASSETS_DIR/terraform-modules/dns/route53/require.tf
- Go to the
terraform
directory:
cd $ASSETS_DIR/terraform
- Replace the old providers:
terraform state replace-provider -auto-approve registry.terraform.io/-/ct registry.terraform.io/poseidon/ct && \
terraform state replace-provider -auto-approve registry.terraform.io/-/template registry.terraform.io/hashicorp/template
- Return to original directory and use kubeconfig generated by lokomotive:
cd - && export KUBECONFIG=$ASSETS_DIR/cluster-assets/auth/kubeconfig
FelixConfiguration
has been moved to calico charts. To avoid firewall interruption, label and annotate it so that it can be managed by Helm while updating:
kubectl label FelixConfiguration default app.kubernetes.io/managed-by=Helm --overwrite=true && \
kubectl annotate FelixConfiguration default meta.helm.sh/release-name=calico --overwrite=true && \
kubectl annotate FelixConfiguration default meta.helm.sh/release-namespace=kube-system --overwrite=true
Finally, run the following:
lokoctl cluster apply --skip-components -v
NOTE: On clusters with a single controller node, you need to delete the old kube-apiserver
ReplicaSet during cluster update.
When lokoctl prints that kube-apiserver
is being updated, run the following command:
kubectl delete rs -n kube-system $(kubectl get rs -n kube-system -l k8s-app=kube-apiserver --no-headers=true --sort-by=metadata.creationTimestamp | tac | tail -n +2 | awk '{print $1}') || true
NOTE: When this gets executed the update process will get interrupted. Re-run lokoctl cluster apply --skip-components -v
to proceed.
The update process typically takes about 10 minutes.
After the update, running lokoctl health
should result in an output similar to the following:
Node Ready Reason Message
lokomotive-controller-0 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-0 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-1 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-2 True KubeletReady kubelet is posting ready status
Name Status Message Error
etcd-0 True {"health":"true"}
- Manually update etcd following the steps mentioned in the doc here.
- Manually update the kubelet running on the nodes, by following the steps mentioned in the doc here.
Run the following command:
until lokoctl component render-manifest cert-manager | kubectl apply -f -; do sleep 1; done
Now it is safe to update:
lokoctl component apply cert-manager
Due to a bug, the valid seccomp profiles in the prometheus-operator-admission
PodSecurityPolicy don't get updated automatically.
Delete psp prometheus-operator-admission
so it gets created with the right seccomp profiles:
kubectl delete psp prometheus-operator-admission
Now it is safe to update:
lokoctl component apply prometheus-operator
Other components are safe to update by running the following command:
lokoctl component apply <component name>
This is a patch release which includes mainly bug fixes.
NOTE: Please read the upgrading guidelines here.
- Dex convert to helm chart and update to v2.25.0 (#962).
- feat: add severity labels to MetalLB alerts (#925).
- Override memory limits of rook operator to 512Mi (#938).
- Fix envoy grafana dashboard errors (#969).
- MetalLB: Fix regressions of tolerations and nodeSelectors (#927).
- Fix controlplane components update order (#937).
- component/metallb: Fix controller tolerations (#931).
- Increased the node-ready and cluster-ping timeouts (#952).
- Fix output of
convertNodeSelector
in rook (#945). - httpbin convert to helm chart (#965).
- FLUO: convert to Helm chart (#935).
- Makefile: Don't build before linting and add new target
lint-bin
(#901).
We're happy to announce the release of Lokomotive v0.4.0 (Darjeeling Himalayan).
This release packs new features, bug fixes, code optimizations, better user interface, latest versions of components, security hardening and much more.
- Update Kubernetes version to
1.17.9
(#849).
- AWS: Add support for custom taints and labels (#832).
- Update etcd to
v3.4.13
(#838). - Update Calico to
v3.15.2
(#841). - Update Grafana to
7.1.4
and chart version5.5.5
(#842). - Update Velero chart to
1.4.2
(#830). - Update ExternalDNS chart to
3.3.0
(#845). - Update Amazon Elastic Block Store (EBS) CSI driver to
v0.6.0
(#856). - Update Cluster Autoscaler to
v2
version1.0.2
(#859). - Update cert-manager to
v0.16.1
(#847). - Update OpenEBS to
v1.12.0
(#781). - Update MetalLB to
v0.1.0-789-g85b7a46a
(#885). - Update Rook to
v1.4.2
(#879). - Use new bootkube image at version
v0.14.0-helm-7047a87
(#775), later updated tov0.14.0-helm-ec64535
as a part of (#704). - Update Prometheus operator to
0.41.0
and chart version9.3.0
(#757). - Update Contour to
v1.7.0
(#771).
- Update all Terraform providers to latest versions (#835).
-
Add autocomplete for bash and zsh in lokoctl (#880).
Run the following command to start using auto-completion for lokoctl:
source <(lokoctl completion bash)
-
Add
kubeconfig
fallback to Terraform state (#701).
- Add label
lokomotive.kinvolk.io/name: <namespace_name>
to all namespaces (#646). - Add admission webhook to lokomotive, which disables automounting
default
service account token (#704). - [Breaking Change] Kubelet joins cluster using TLS Bootstrapping now, add flag
enable_tls_bootstrap = false
to disable. (#618). - Add
csi_plugin_node_selector
andcsi_plugin_toleration
for rook-ceph's CSI plugin (#892).
- Setting up third party OAuth for Grafana (#542).
- Upgrading bootstrap kubelet (#592).
- Upgrading etcd (#802).
- How to add custom monitoring resources? (#554).
- Kubernetes storage with Rook Ceph on Packet cloud (#494).
- aws: Add check for multiple worker pools with same LB ports (#889).
- packet: ignore changes to plan and user_data on controller nodes (#907).
- Introduce platform.PostApplyHook interface and implement it for AKS cluster (#886).
- aws-ebs-csi-driver: add NetworkPolicy allowing access to metadata (#865).
- pkg/components/cluster-autoscaler: fix checking device uniqueness (#768).
- Replace use of github.com/pkg/errors.Wrapf with fmt (#831, #877).
- Refactor assets handling (#807).
- cli/cmd: improve --kubeconfig-file flag help message formatting (#818).
- Use host's /etc/hosts entries for bootkube (#409).
- Refactor Terraform executor (#794).
- Pass kubeconfig content around rather than a file path (#631).
- Update the
ct
Terraform provider tov0.6.1
, find the install instructions here.
In this release we introduced TLS bootstrapping and we enable it by default. To avoid cluster
recreation, disable it by adding the following attribute to the cluster ...
block:
cluster "packet" {
enable_tls_bootstrap = false
...
Go to your cluster's directory and run the following command:
lokoctl cluster apply --skip-components -v
The update process typically takes about 10 minutes.
After the update, running lokoctl health
should result in an output similar to the following.
Node Ready Reason Message
lokomotive-controller-0 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-0 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-1 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-2 True KubeletReady kubelet is posting ready status
Name Status Message Error
etcd-0 True {"health":"true"}
- Manually upgrade etcd following the steps mentioned in the doc here.
- Manually upgrade the kubelet running on the nodes, by following the steps mentioned in the doc here.
The latest version of Metallb changes the labels of the ingress nodes. Label all the nodes that have
asn
set with the new labels:
kubectl label $(kubectl get nodes -o name -l metallb.universe.tf/my-asn) \
metallb.lokomotive.io/my-asn=65000 metallb.lokomotive.io/peer-asn=65530
Find a peer address of a node and assign it new label:
for node in $(kubectl get nodes -o name -l metallb.universe.tf/peer-address); do
peer_ip=$(kubectl get $node -o jsonpath='{.metadata.labels.metallb\.universe\.tf/peer-address}')
kubectl label $node metallb.lokomotive.io/peer-address=$peer_ip
done
Now it is safe to update:
lokoctl component apply metallb
These steps are curated from the upgrade doc provided by rook: https://rook.io/docs/rook/master/ceph-upgrade.html.
-
Keep note of the CSI images:
kubectl --namespace rook get pod -o \ jsonpath='{range .items[*]}{range .spec.containers[*]}{.image}{"\n"}' \ -l 'app in (csi-rbdplugin,csi-rbdplugin-provisioner,csi-cephfsplugin,csi-cephfsplugin-provisioner)' | \ sort | uniq
-
Ensure autoscale is on
Ensure that the output of the command
ceph osd pool autoscale-status | grep replicapool
sayson
(in the last column) and notwarn
in the toolbox pod. If it sayswarn
. Then run the commandceph osd pool set replicapool pg_autoscale_mode on
to set it toon
. This is to ensure we are not facing: rook/rook#5608.Read more about the toolbox pod here: https://github.com/kinvolk/lokomotive/blob/v0.4.0/docs/how-to-guides/rook-ceph-storage.md#enable-and-access-toolbox.
NOTE: If you see this error
[errno 5] RADOS I/O error (error connecting to the cluster)
in toolbox pod then tag the toolbox pod image to a specific version using this command:kubectl -n rook set image deploy rook-ceph-tools rook-ceph-tools=rook/ceph:v1.3.2
. -
Ceph Status
Run the following in the toolbox pod:
watch ceph status
Ensure that the output says that health is
HEALTH_OK
. Match the output such that everything looks fine as explained here: https://rook.io/docs/rook/master/ceph-upgrade. html#status-output. -
Pods in rook namespace:
Watch the pods status in another from the
rook
namespace in another terminal window. Just running this will be enough:watch kubectl -n rook get pods -o wide
-
Watch for the rook version update
Run the following command to keep an eye on the rook version update as it is rolls down for all the components:
watch --exec kubectl -n rook get deployments -l rook_cluster=rook -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'
You should see that
rook-version
slowly changes tov1.4.2
. -
Watch for the Ceph version update
Run the following command to keep an eye on the Ceph version update as the new pods come up:
watch --exec kubectl -n rook get deployments -l rook_cluster=rook -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \tceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}'
You should see that
ceph-version
slowly changes to15.
-
Keep an eye on the events in the rook namespace
kubectl -n rook get events -w
-
Ceph Dashboard
Keep it open in one window, but sometimes it is more hassle than any help. It keeps reloading and logs you out automatically. See this on how to access the dashboard: https://github.com/kinvolk/lokomotive/blob/v0.4.0/docs/how-to-guides/rook-ceph-storage.md#access-the-ceph-dashboard.
-
Grafana dashboards
Keep an eye on the Grafana dashboard, but the data here will always be old, and the most reliable state of the system will come from the watch running inside toolbox pod.
-
Run updates
kubectl apply -f https://raw.githubusercontent.com/kinvolk/lokomotive/v0.4.0/assets/charts/components/rook/templates/resources.yaml lokoctl component apply rook rook-ceph
-
Verify that the csi images are updated:
kubectl --namespace rook get pod -o jsonpath='{range .items[*]}{range .spec.containers[*]}{.image}{"\n"}' -l 'app in (csi-rbdplugin,csi-rbdplugin-provisioner,csi-cephfsplugin,csi-cephfsplugin-provisioner)' | sort | uniq
-
Final checks:
Once everything is up to date then run following commands in the toolbox pod:
ceph status ceph osd status ceph df rados df
OpenEBS control plane components and data plane components work independently. Even after the OpenEBS Control Plane components have been upgraded to 1.12.0, the Storage Pools and Volumes (both jiva and cStor) will continue to work with older versions.
Upgrade functionality is still under active development. It is highly recommended to schedule a downtime for the application using the OpenEBS PV while performing this upgrade. Also, make sure you have taken a backup of the data before starting the below upgrade procedure. - Openebs documentation
Upgrade the component by running the following steps:
lokoctl component apply openebs-operator openebs-storage-class
- Extract the SPC name using the following command and replace it in the subsequent YAML file:
$ kubectl get spc
NAME AGE
cstor-pool-openebs-replica1 24h
The Job spec for upgrade cstor pools is:
# This is an example YAML for upgrading cstor SPC.
# Some of the values below need to be changed to
# match your openebs installation. The fields are
# indicated with VERIFY
---
apiVersion: batch/v1
kind: Job
metadata:
# VERIFY that you have provided a unique name for this upgrade job.
# The name can be any valid K8s string for name. This example uses
# the following convention: cstor-spc-<flattened-from-to-versions>
name: cstor-spc-11101120
# VERIFY the value of namespace is same as the namespace where openebs components
# are installed. You can verify using the command:
# `kubectl get pods -n <openebs-namespace> -l openebs.io/component-name=maya-apiserver`
# The above command should return status of the openebs-apiserver.
namespace: openebs
spec:
backoffLimit: 4
template:
spec:
# VERIFY the value of serviceAccountName is pointing to service account
# created within openebs namespace. Use the non-default account.
# by running `kubectl get sa -n <openebs-namespace>`
serviceAccountName: openebs-operator
containers:
- name: upgrade
args:
- "cstor-spc"
# --from-version is the current version of the pool
- "--from-version=1.11.0"
# --to-version is the version desired upgrade version
- "--to-version=1.12.0"
# Bulk upgrade is supported from 1.9
# To make use of it, please provide the list of SPCs
# as mentioned below
- "cstor-pool-openebs-replica1"
# For upgrades older than 1.9.0, use
# '--spc-name=<spc_name> format as
# below commented line
# - "--spc-name=cstor-sparse-pool"
#Following are optional parameters
#Log Level
- "--v=4"
#DO NOT CHANGE BELOW PARAMETERS
env:
- name: OPENEBS_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
tty: true
# the image version should be same as the --to-version mentioned above
# in the args of the Job
image: quay.io/openebs/m-upgrade:1.12.0
imagePullPolicy: Always
restartPolicy: OnFailure
Apply the Job manifest using kubectl
. Check the logs of the pod started by the Job:
$ kubectl get logs -n openebs cstor-spc-1001120-dc7kx
..
..
..
I0903 12:25:00.397066 1 spc_upgrade.go:102] Upgrade Successful for spc cstor-pool-openebs-replica1
I0903 12:25:00.397091 1 cstor_spc.go:120] Successfully upgraded storagePoolClaim{cstor-pool-openebs-replica1} from 1.11.0 to 1.12.0
Extract the cstor
volume names using the following command and replace it in the subsequent YAML file:
$ kubectl get cstorvolumes -A
NAMESPACE NAME STATUS AGE CAPACITY
openebs pvc-3415af20-db82-42cf-99e0-5d0f2809c657 Healthy 72m 50Gi
openebs pvc-c3d0b587-5da9-457b-9d0e-23331ade7f3d Healthy 77m 50Gi
openebs pvc-e115f3f9-1666-4680-a932-d05bfd049087 Healthy 77m 100Gi
Create a Kubernetes Job spec for upgrading the cstor volume. An example spec is as follows:
# This is an example YAML for upgrading cstor volume.
# Some of the values below need to be changed to
# match your openebs installation. The fields are
# indicated with VERIFY
---
apiVersion: batch/v1
kind: Job
metadata:
# VERIFY that you have provided a unique name for this upgrade job.
# The name can be any valid K8s string for name. This example uses
# the following convention: cstor-vol-<flattened-from-to-versions>
name: cstor-vol-11101120
# VERIFY the value of namespace is same as the namespace where openebs components
# are installed. You can verify using the command:
# `kubectl get pods -n <openebs-namespace> -l openebs.io/component-name=maya-apiserver`
# The above command should return the status of the openebs-apiserver.
namespace: openebs
spec:
backoffLimit: 4
template:
spec:
# VERIFY the value of serviceAccountName is pointing to service account
# created within openebs namespace. Use the non-default account.
# by running `kubectl get sa -n <openebs-namespace>`
serviceAccountName: openebs-operator
containers:
- name: upgrade
args:
- "cstor-volume"
# --from-version is the current version of the volume
- "--from-version=1.11.0"
# --to-version is the version desired upgrade version
- "--to-version=1.12.0"
# Bulk upgrade is supported from 1.9
# To make use of it, please provide the list of cstor volumes
# as mentioned below
- "pvc-3415af20-db82-42cf-99e0-5d0f2809c657"
- "pvc-c3d0b587-5da9-457b-9d0e-23331ade7f3d"
- "pvc-e115f3f9-1666-4680-a932-d05bfd049087"
# For upgrades older than 1.9.0, use
# '--pv-name=<pv_name> format as
# below commented line
# - "--pv-name=pvc-c630f6d5-afd2-11e9-8e79-42010a800065"
#Following are optional parameters
#Log Level
- "--v=4"
#DO NOT CHANGE BELOW PARAMETERS
env:
- name: OPENEBS_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
tty: true
# the image version should be same as the --to-version mentioned above
# in the args of the job
image: quay.io/openebs/m-upgrade:1.12.0
imagePullPolicy: Always
restartPolicy: OnFailure
---
Apply the Job manifest using kubectl
. Check the logs of the pod started by the Job:
$ kubectl get logs -n openebs cstor-vol-1001120-8b2h9
..
..
..
I0903 12:41:41.984635 1 cstor_volume_upgrade.go:609] Upgrade Successful for cstor volume pvc-e115f3f9-1666-4680-a932-d05bfd049087
I0903 12:41:41.994013 1 cstor_volume.go:119] Successfully upgraded cstorVolume{pvc-e115f3f9-1666-4680-a932-d05bfd049087} from 1.11.0 to 1.12.0
Verify that all the volumes are updated to the latest version by running the following command:
$ kubectl get cstorvolume -A -o jsonpath='{.items[*].versionDetails.status.current}'
1.12.0 1.12.0 1.12.0
Other components are safe to upgrade by running the following command:
lokoctl component apply <component name>
We're happy to announce the release of Lokomotive v0.3.0 (Coast Starlight).
This release packs new features and bugfixes. Some of the highlights are:
- Kubernetes 1.18.6
- For Lokomotive clusters running on top of AKS, Kubernetes 1.16.10 is installed.
- Component updates
-
Update default machine type from
t1.small.x86
toc3.small.x86
, sincet1.small.x86
are EOL and no longer available in new Packet projects (#612).WARNING: If you haven't explicitly defined the controller_type and/or worker_pool.node_type configuration options, upgrading to this release will replace your controller and/or worker nodes with c3.small.x86 machines thereby losing all your cluster data. To avoid this, set these configuration options to the desired values.
Make sure that the below attributes are explicitly defined in your cluster configuration. This only applies to machine type
t1.small.x86
.cluster "packet" { . . controller_type = "t1.small.x86" . . worker_pool "pool-name" { . node_type = "t1.small.x86" . } }
- Update Kubernetes version to 1.16.10 (#712).
-
prometheus-operator: Organize Prometheus related attributes under a
prometheus
block in the configuration (#710). -
Use
prometheus.ingress.host
to expose Prometheus instead ofprometheus_external_url
(#710). -
contour: Remove
ingress_hosts
from contour configuration (#635).
-
Add
enable_toolbox
attribute to rook-ceph component (#649). This allows managing and configuring Ceph using toolbox pod. -
Add Prometheus feature
external_labels
for federated clusters to Prometheus operator component. This helps to identify metrics queried from different clusters. (#710).
-
Add
Type
column to Attribute reference table in configuration references (#651). -
Update contour configuration reference for usage with AWS (#674).
-
Add documentation related to the usage of
clc_snippets
for Packet and AWS (#657). -
How to guide for setting up monitoring on Lokomotive (#480).
-
Add
codespell
section in development documentation (#700). -
Include a demo GIF in the readme (#636).
- Remove contour ingress workaround (due to an upstream issue) for ExternalDNS (#635).
-
Do not show Helm release values in terraform output (#627).
-
Remove Terraform provider aliases from platforms code (#617).
-
Following flatcar/Flatcar#123, Flatcar 2513.1.0 for ARM contains the dig binary so the workaround is no longer needed (#703).
-
Improve error message for
wait-for-dns
output (#735). -
Add
codespell
to enable spell check on all PRs (#661).
There have been some minor changes to the configurations of following components:
- contour
- prometheus-operator.
Please make sure new the configuration structure is in place before the upgrade.
Optional ingress_hosts
attribute is now removed.
old:
component "contour" {
.
.
ingress_hosts = ["*.example.lokomotive-k8s.net"]
}
new:
component "contour" {
.
.
}
- Prometheus specific attributes are now under a
prometheus
block. - A new optional
prometheus.ingress
sub-block is introduced to expose Prometheus over ingress. - Attribute
external_url
is now removed and now configured underprometheus.ingress.host
. Remove URL scheme (e.g.https://
) and URI (e.g./prometheus
) when configuring. URI is no longer supported and protocol is always HTTPS.
old:
component "prometheus-operator" {
.
.
prometheus_metrics_retention = "14d"
prometheus_external_url = "https://prometheus.example.lokomotive-k8s.net"
prometheus_storage_size = "50GiB"
prometheus_node_selector = {
"kubernetes.io/hostname" = "worker3"
}
.
.
}
new:
component "prometheus-operator" {
.
.
prometheus {
metrics_retention = "14d"
storage_size = "50GiB"
node_selector = {
"kubernetes.io/hostname" = "worker3"
}
ingress {
host = "prometheus.example.lokomotive-k8s.net"
}
.
.
}
.
.
}
Check out the new syntax in the Prometheus Operator configuration reference for details.
Go to your cluster's directory and run the following command.
lokoctl cluster apply
The update process typically takes about 10 minutes.
After the update, running lokoctl health
should result in an output similar to the following.
Node Ready Reason Message
lokomotive-controller-0 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-0 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-1 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-2 True KubeletReady kubelet is posting ready status
Name Status Message Error
etcd-0 True {"health":"true"}
OpenEBS control plane components and data plane components work independently. Even after the OpenEBS Control Plane components have been upgraded to 1.11.0, the Storage Pools and Volumes (both jiva and cStor) will continue to work with older versions.
Upgrade functionality is still under active development. It is highly recommended to schedule a downtime for the application using the OpenEBS PV while performing this upgrade. Also, make sure you have taken a backup of the data before starting the below upgrade procedure. - Openebs documentation
- Extract the SPC name using
kubectl get spc
:
NAME AGE
cstor-pool-openebs-replica1 24h
The Job spec for upgrade cstor pools is:
#This is an example YAML for upgrading cstor SPC.
#Some of the values below needs to be changed to
#match your openebs installation. The fields are
#indicated with VERIFY
---
apiVersion: batch/v1
kind: Job
metadata:
#VERIFY that you have provided a unique name for this upgrade job.
#The name can be any valid K8s string for name. This example uses
#the following convention: cstor-spc-<flattened-from-to-versions>
name: cstor-spc-1001120
#VERIFY the value of namespace is same as the namespace where openebs components
# are installed. You can verify using the command:
# `kubectl get pods -n <openebs-namespace> -l openebs.io/component-name=maya-apiserver`
# The above command should return status of the openebs-apiserver.
namespace: openebs
spec:
backoffLimit: 4
template:
spec:
#VERIFY the value of serviceAccountName is pointing to service account
# created within openebs namespace. Use the non-default account.
# by running `kubectl get sa -n <openebs-namespace>`
serviceAccountName: openebs-operator
containers:
- name: upgrade
args:
- "cstor-spc"
# --from-version is the current version of the pool
- "--from-version=1.10.0"
# --to-version is the version desired upgrade version
- "--to-version=1.11.0"
# Bulk upgrade is supported from 1.9
# To make use of it, please provide the list of SPCs
# as mentioned below
- "cstor-pool-name"
# For upgrades older than 1.9.0, use
# '--spc-name=<spc_name> format as
# below commented line
# - "--spc-name=cstor-sparse-pool"
#Following are optional parameters
#Log Level
- "--v=4"
#DO NOT CHANGE BELOW PARAMETERS
env:
- name: OPENEBS_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
tty: true
# the image version should be same as the --to-version mentioned above
# in the args of the job
image: quay.io/openebs/m-upgrade:1.11.0
imagePullPolicy: Always
restartPolicy: OnFailure
Apply the Job manifest using kubectl
. Check the logs of the pod started by the Job:
$ kubectl get logs -n openebs cstor-spc-1001120-dc7kx
..
..
..
I0728 15:15:41.321450 1 spc_upgrade.go:102] Upgrade Successful for spc cstor-pool-openebs-replica1
I0728 15:15:41.321473 1 cstor_spc.go:120] Successfully upgraded storagePoolClaim{cstor-pool-openebs-replica1} from 1.10.0 to 1.11.0
Extract the PV name using kubectl get pv:
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-b69260c4-5cc1-4461-b762-851fa53629d9 50Gi RWO Delete Bound monitoring/data-alertmanager-prometheus-operator-alertmanager-0 openebs-replica1 24h
pvc-da29e4fe-1841-4da9-a8f6-4e3c92943cbb 50Gi RWO Delete Bound monitoring/data-prometheus-prometheus-operator-prometheus-0 openebs-replica1 24h
Create a Kubernetes Job spec for upgrading the cstor volume. An example spec is as follows:
#This is an example YAML for upgrading cstor volume.
#Some of the values below needs to be changed to
#match your openebs installation. The fields are
#indicated with VERIFY
---
apiVersion: batch/v1
kind: Job
metadata:
#VERIFY that you have provided a unique name for this upgrade job.
#The name can be any valid K8s string for name. This example uses
#the following convention: cstor-vol-<flattened-from-to-versions>
name: cstor-vol-1001120
#VERIFY the value of namespace is same as the namespace where openebs components
# are installed. You can verify using the command:
# `kubectl get pods -n <openebs-namespace> -l openebs.io/component-name=maya-apiserver`
# The above command should return status of the openebs-apiserver.
namespace: openebs
spec:
backoffLimit: 4
template:
spec:
#VERIFY the value of serviceAccountName is pointing to service account
# created within openebs namespace. Use the non-default account.
# by running `kubectl get sa -n <openebs-namespace>`
serviceAccountName: openebs-operator
containers:
- name: upgrade
args:
- "cstor-volume"
# --from-version is the current version of the volume
- "--from-version=1.10.0"
# --to-version is the version desired upgrade version
- "--to-version=1.11.0"
# Bulk upgrade is supported from 1.9
# To make use of it, please provide the list of PVs
# as mentioned below
- "pvc-b69260c4-5cc1-4461-b762-851fa53629d9"
- "pvc-da29e4fe-1841-4da9-a8f6-4e3c92943cbb"
# For upgrades older than 1.9.0, use
# '--pv-name=<pv_name> format as
# below commented line
# - "--pv-name=pvc-c630f6d5-afd2-11e9-8e79-42010a800065"
#Following are optional parameters
#Log Level
- "--v=4"
#DO NOT CHANGE BELOW PARAMETERS
env:
- name: OPENEBS_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
tty: true
# the image version should be same as the --to-version mentioned above
# in the args of the job
image: quay.io/openebs/m-upgrade:1.11.0
imagePullPolicy: Always
restartPolicy: OnFailure
---
Apply the Job manifest using kubectl
. Check the logs of the pod started by the Job:
$ kubectl get logs -n openebs cstor-vol-1001120-8b2h9
..
..
..
I0728 15:19:48.496031 1 cstor_volume_upgrade.go:609] Upgrade Successful for cstor volume pvc-da29e4fe-1841-4da9-a8f6-4e3c92943cbb
I0728 15:19:48.502876 1 cstor_volume.go:119] Successfully upgraded cstorVolume{pvc-da29e4fe-1841-4da9-a8f6-4e3c92943cbb} from 1.10.0 to 1.11.0
This is a patch release to fix AKS platform deployments.
- Updated Kubernetes version on AKS platform to 1.16.9 (#626). This fixes deploying AKS clusters, as the previously used version is not available anymore.
- Updated
golang.org/x/text
dependency to v0.3.3 (#648) to address CVE-2020-14040.
- Fixes example configuration for AKS platform (#626). Contour component configuration syntax changed and those files had not been updated.
- Bootkube Docker images are now pulled using Docker protocol, as quay.io plans to deprecate pulling images using ACI (#656.
- AKS platform is now being tested for every pull request and
master
branch changes in the CI. - Added script for finding available component updates in upstream repositories (#375).
We're happy to announce Lokomotive v0.2.0 (Bernina Express).
This release includes a ton of new features, changes and bugfixes. Here are some highlights:
- Kubernetes v1.18.3.
- Many component updates.
- AKS platform support.
- Cloudflare DNS support.
- Monitoring dashboards fixes.
- Dynamic provisioning of Persistent Volumes on AWS.
- Security improvements.
Check the full list of changes for more details.
-
The Calico component has a new CRD that needs to be applied manually.
kubectl apply -f https://raw.githubusercontent.com/kinvolk/lokomotive/v0.2.0/assets/lokomotive-kubernetes/bootkube/resources/charts/calico/crds/kubecontrollersconfigurations.yaml
-
Some component objects changed
apiVersion
so they need to be labeled and annotated manually to be able to upgrade them.-
Dex
kubectl -n dex label ingress dex app.kubernetes.io/managed-by=Helm kubectl -n dex annotate ingress dex meta.helm.sh/release-name=dex kubectl -n dex annotate ingress dex meta.helm.sh/release-namespace=dex
-
Gangway
kubectl -n gangway label ingress gangway app.kubernetes.io/managed-by=Helm kubectl -n gangway annotate ingress gangway meta.helm.sh/release-name=gangway kubectl -n gangway annotate ingress gangway meta.helm.sh/release-namespace=gangway
-
Metrics Server
kubectl -n kube-system label rolebinding metrics-server-auth-reader app.kubernetes.io/managed-by=Helm kubectl -n kube-system annotate rolebinding metrics-server-auth-reader meta.helm.sh/release-namespace=kube-system kubectl -n kube-system annotate rolebinding metrics-server-auth-reader meta.helm.sh/release-name=metrics-server
-
httpbin
kubectl -n httpbin label ingress httpbin app.kubernetes.io/managed-by=Helm kubectl -n httpbin annotate ingress httpbin meta.helm.sh/release-namespace=httpbin kubectl -n httpbin annotate ingress httpbin meta.helm.sh/release-name=httpbin
-
You need to remove an asset we've updated from your assets directory:
rm $ASSETS_DIRECTORY/lokomotive-kubernetes/aws/flatcar-linux/kubernetes/workers.tf
Before upgrading, make sure your lokocfg configuration follows the new v0.2.0 syntax. Here we describe the changes.
The DNS configuration syntax for the Packet platform has been simplified.
Here's an example for the Route 53 provider.
Old:
dns {
zone = "<DNS_ZONE>"
provider {
route53 {
zone_id = "<ZONE_ID>"
}
}
}
New:
dns {
zone = "<DNS_ZONE>"
provider = "route53"
}
Check out the new syntax in the Packet configuration reference for details.
The owner_id
field is now required.
There is a specific block for Grafana now.
Here's an example of the changed syntax.
Old:
component "prometheus-operator" {
namespace = "<NAMESPACE>"
grafana_admin_password = "<GRAFANA_PASSWORD>"
etcd_endpoints = ["<ETCD_IP>"]
}
New:
component "prometheus-operator" {
namespace = "<NAMESPACE>"
grafana {
admin_password = "<GRAFANA_PASSWORD>"
}
# etcd_endpoints is not needed anymore
}
Check out the new syntax in the Prometheus Operator configuration reference for details.
Go to your cluster's directory and run the following command.
lokoctl cluster apply
The update process typically takes about 10 minutes.
After the update, running lokoctl health
should result in an output similar to the following.
Node Ready Reason Message
lokomotive-controller-0 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-0 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-1 True KubeletReady kubelet is posting ready status
lokomotive-1-worker-2 True KubeletReady kubelet is posting ready status
Name Status Message Error
etcd-0 True {"health":"true"}
If you have the cert-manager component installed, you will get an error on the first update and need to do a second one. Run the following to upgrade your components again.
lokoctl component apply
- Update Kubernetes to v1.18.3 (#459).
- openebs: update to 1.10.0 (#528).
- dex: update to v2.24.0 (#525).
- contour: update to v1.5.0 (#524).
- cert-manager: update to v0.15.1 (#522).
- calico: update to v3.14.1 (#415).
- metrics-server: update to 0.3.6 (#343).
- external-dns: update to 2.21.2 (#340).
- rook: update to v1.3.1 (#300).
- etcd: Update to v3.4.9 (#521).
- Add AKS platform support (#219).
- Handle OS interrupts in lokoctl to fix leaking terraform process (#483).
- Fix self-hosted Kubelet on bare metal platform (#436). It wasn't working.
- grafana: remove cluster label in kubelet dashboard (#474). This fixes missing information in the Kubelet Grafana dashboard.
- Rook Ceph: Fix dashboard templating (#476). Some graphs were not showing information.
- pod-checkpointer: update to pod-checkpointer image (#498). Fixes communication between the pod checkpointer and the kubelet.
- Fix AWS worker pool handling (#367). Remove invisible worker pool of size 0 and fix NLB listener wiring to fix ingress.
- Fix rendering of
ingress_hosts
in Contour component (#417). Fixes having a wildcard subdomain as ingress for Contour. - kube-apiserver: fix TLS handshake errors on Packet (#297). Removes harmless error message.
- calico-host-protection: fix node name of HostEndpoint objects (#201). Fixes GlobalNetworkPolcies for nodes.
- aws: add the AWS EBS CSI driver (#423). This allows dynamic provisioning of Persistent Volunmes on AWS.
- grafana: provide root_url in grafana.ini conf (#547). So Grafana exposes its URL and not localhost.
- packet: add Cloudflare DNS support (#422).
- Monitor etcd by default (#493). It wasn't being monitored before.
- Add variable
grafana_ingress_host
to expose Grafana (#468). Allows exposing Grafana through Ingress. - Add ability to provide oidc configuration (#182). Allows to configure the API Server to use OIDC for authentication. Previously this was a manual operation.
- Parameterise ClusterIssuer for Dex, Gangway, HTTPBin (#482). Allows using a different cluster issuer.
- grafana: enable piechart plugin for the Prometheus Operator chart (#469). Pie chart graphs weren't showing.
- Add a knob to disable self hosted kubelet (#425).
- rook-ceph: add StorageClass config (#402). This allows setting up rook-ceph as the default storage class.
- Add monitoring config and variable to rook component (#405). This allows monitoring rook.
- packet: add support for hardware reservations (#299).
- Add support for
lokoctl component delete
(#268). - bootkube: add calico-kube-controllers (#283).
- metallb: add AlertManager rules (#140).
- Label service-monitors so that they are discovered by Prometheus (#200). This makes sure all components are discovered by Prometheus.
- external-dns: expose owner_id (#207). Otherwise several clusters in the same DNS Zone will interact badly with each other.
- contour: add Alertmanager rules (#193).
- contour: add nodeAffinity and tolerations (#386). This allows using ingress in a subset of cluster nodes.
- prometheus-operator: add storage class & size options (#387).
- grafana: add secret_env variable (#541). This allows users to provide arbitrary key values pairs that will be exposed as environment variables inside the Grafana pod.
- rook-ceph: allow volume resizing (#640). This enables the PVs created by the storage class to be resized on the fly.
- Block access to metadata servers for all components by default (#464). Most components don't need it and it is a security risk.
- packet: disable syncing allowed SSH keys on nodes (#471). So nodes aren't accessible to all authorized SSH keys in the Packet user and project keys.
- packet: tighten up node bootstrap iptables rules (#202). So nodes are better protected during bootstrap.
- PSP: Rename
restricted
tozz-minimal
(#293). So PSPs apply in the right order. - kubelet: don't automount service account token (#306). The Kubelet doesn't need it. Apiserver, mounted using HostPath.
- prometheus-operator: add seccomp annotations to kube-state-metrics (#288). This reduces the attack surface by blocking unneeded syscalls.
- prometheus operator: add seccomp annotations to PSP (#294). So Prometheus Operator pods have seccomp enabled.
- Binding improvements (#194). This makes the
kubelet
,kube-proxy
andcalico-node
processes listen on the Host internal IP.
- Add
--confirm
flag to delete component without asking for confirmation (#568). - Add error message for missing ipxe_script_url (#540).
- Show logs when terraform fails in
lokoctl cluster apply/destroy
(#323). - cli/cmd: rename --kubeconfig flag to --kubeconfig-file (#602). This is because cobra/viper consider the KUBECONFIG environment variable and the --kubeconfig flag the same and this can cause surprising behavior.
- docs: make Packet quickstart quick (#332).
- docs: document Route 53 and S3+DynamoDB permissions (#561).
- docs/quickstart/aws: Fix flatcar-linux-update-operator link (#552).
- docs: Add detailed contributing guidelines (#404).
- docs: Add instructions to run conformance tests (#236).
- docs/quickstarts: add reference to usage docs and PSP note (#233).
- docs: clarify values for ssh_pubkeys (#230).
- docs/quickstarts: fix kubeconfig path (#229).
- docs/prometheus-operator: clarify alertmanager config indentation (#199).
- quickstart-docs: Add ssh-agent instructions (#325).
- docs: provide alternate way of declaring alertmanager config (#570).
- examples: make Flatcar channels explicit (#565).
- docs/aws: document TLS handshake errors in kube-apiserver (#599).
- Update terraform-provider-ct to v0.5.0 and mention it in the docs (#281).
- Update broken links (#569).
- Fix example configs and typos (#535).
- docs/httpbin: Fix table (#510).
- Add missing bracket in Prometheus Operator docs (#490).
- docs: Update the component deleting steps (#481).
- Fix broken bare metal config link (#473).
- Remove period (.) from flag descriptions (#574).
- Several fixes to make updates from v0.1.0 smooth (#638, #639, #642)
- baremetal quickstart: Add double quotes (#633).
- pkg/components/util: improvements (#605).
- New internal package for helper functions (#588).
- Remove vars from assets that were unused by tmpl file (#620).
- keys: iago's key is kinvolk.io, not gmail 🤓 (#616).
Initial release.
- Kubernetes v1.18.0
- Running on Flatcar Container Linux
- Fully self-hosted, including the kubelet
- Single or multi-master
- Calico networking
- On-cluster etcd with TLS, RBAC-enabled, PSP-enabled, network policies
- In-place upgrades, including experimental kubelet upgrades
- Supported on:
- Initial Lokomotive Components: