Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
stylenone

This is adapted from: https://cluster-api.sigs.k8s.io/tasks/upgrading-clusters.html

...

It’s recommended to regularly upgrade your clusters. This avoids trying to maintain tooling version compatibility against major Kubernetes versions.

OS Patches

The operating system and associated packages can be updated independently, e.g. to apply security patches to the host OS.

The Ubuntu image is stripped so the packages (and number of vulnerabilities) is significantly lower. The Cloud team will make it clear when a CVE applies to the CAPI Ubuntu images.

Skip to https://stfc.atlassian.net/wiki/spaces/CLOUDKB/pages/edit-v2/285704256#Without-minor-version-upgrade .

Note

Containers are controlled by Kubernetes. If a container (e.g. gpu-operator) has a known CVE this will require you to upgrade your deployment (e.g. via helm or your config management tool).

Multiple Version Upgrades

...

To upgrade major versions you will need to follow the https://stfc.atlassian.net/wiki/spaces/CLOUDKB/pages/285704256/Cluster+API+Upgrade#Upgrade-Clusterctl-and-CAPI-components section first then https://stfc.atlassian.net/wiki/spaces/CLOUDKB/pages/285704256/Cluster+API+Upgrade#UpgradingUpgrade#Kubernetes-KubernetesImage-Majorand-Version-Upgrades for each hop.

Overview

This process assumes the administrator is doing a full upgrade of all components. These can be upgraded independently with the caveat that the Infrastructure layer supports the version of Kubernetes planned: https://cluster-api.sigs.k8s.io/reference/versions

Infrastructure

Components which interact with OpenStack infrastructure

  • OpenStackCluster and Addons charts

    • These provide details for our OpenStack cloud components (i.e. allow Cluster API + Cluster API Openstack to create VMs) and fulfill the “contract” requirements from the cluster CRD

  • Clusterctl and cluster.x-k8s.io

    • These represent the generic CAPI components and expect an infrastructure provider (e.g. OpenStack) to a contract to “adapt” to each cloud provider

Kubernetes

Kubernetes components excluding those which handle OpenStack components. These are generic and all CAPI documentation online applies

  • Kubernetes Version

    • This is set by the cluster KRD and can be found with kubectl describe kubeadmcontrolplane -A

      • Current value can be seen under spec.Version

    • Can be set to a n+1 minor versions from the current version

    • Can be set to any patch of the same minor version

    • Upper bound is set from the CAPI images you are running

  • CAPI Image

    • These are pre-generated Ubuntu images with kubeadm, containerd, …etc. packages pre-installed

    • Generated by the Cloud Team to ensure that they come meet the combined UKRI and STFC Cloud security policies - see Terms Of Service

    • OS and package patches can be upgraded independently of Kubernetes version

      • I.e. a K8s cluster set to v5.10.2 with a CAPI image running v5.10.6 is allowed

      • However a K8s cluster set to v5.10.8 on a CAPI image running v5.10.6 is not

Infrastructure Upgrades

Upgrading OpenStackCluster Charts

Info

This is required to bring any annotations required by the latest cluster.x-k8s.io/vxyz CRD which will be upgraded by clusterctl in the subsequent step

Update the helm Cluster API charts:

Code Block
helm repo update capi
helm repo update capi-addons

helm upgrade cluster-api-addon-provider capi-addons/cluster-api-addon-provider -n clusters --wait
cd <folder_with_values>
  • Ensure the latest helm chart works without upgrading the K8s Major version:

Code Block
helm upgrade <cluster_name> capi/openstack-cluster -f values.yaml -f clouds.yaml -f user-values.yaml -f flavors.yaml -n clusters
  • Update user-values.yaml by either git pull the latest image from the cloud team, or manually editing the machineImage and kubernetesVersion fields

  • Re-run the helm upgrade to upgrade the cluster version:

Code Block
helm upgrade <cluster_name> capi/openstack-cluster --install -f values.yaml -f clouds.yaml -f user-values.yaml -f flavors.yaml -n clusters
  • Monitor the upgrade using clusterctl describe cluster <cluster_name> -n clusters

Upgrade Clusterctl and CAPI components

Info

We need to upgrade clusterctl to be aware of the latest CAPI

...

and CAPO components. These handle the infrastructure integration.

Download the latest version which supports your cluster version.

...

Code Block
helm list -n clusters # print management cluster name
clusterctl upgrade plan -n clusters <name>
  • Validate that the upgrade is valid and apply the command provided by clusterctl

...

Kubernetes

...

Update the helm Cluster API charts:

Code Block
helm repo update capi
helm repo update capi-addons

helm upgrade cluster-api-addon-provider capi-addons/cluster-api-addon-provider -n clusters --install --wait
cd <folder_with_values>

...

Image and Version Upgrades

This section assumes production clusters and upgrades components individually.

For development / low risk clusters both steps can be combined into a single roll-out.

You’ll need to upgrade VM images and Kubernetes version to the latest patch version available before doing any major upgrades:

I.e. if you’re on 1.100.12 upgrade to the latest 1.100.x, this ensures any bug-fixes are applied which could prevent later upgrades.

For major and minor upgrades:

  • Lookup the latest image build for Kubernetes, this can be found in images section of the web interface

  • Edit the kubernetesVersion in user-values.yaml to match the image name

  • Edit the machineImage in user-values.yaml to use the latest patch release

Code Block
helm upgrade <cluster_name> capi/openstack-cluster --install -f values.yaml -f clouds.yaml -f user-values.yaml -f flavors.yaml -n clusters
  • Update user-values.yaml by either git pull the latest image from the cloud team, or manually editing the machineImage and kubernetesVersion fields

  • Re-run the helm upgrade to upgrade the cluster version:

Code Block
helm upgrade <cluster_name> capi/openstack-cluster --install -f values.yaml -f clouds.yaml -f user-values.yaml -f flavors.yaml -n clusters

...

  • Wait for the rollout of new infra to complete

    • The rollout can be monitored with kubectl get kcp -A and kubectl get md -A

    • Machine details can be found in kubectl get machines -A and kubectl get openstackmachines -A

  • Repeat for each major upgrade step

    • You can only do a single major upgrade at a time, e.g. 1.100.12 to 1.101.4, then 1.102.6

Troubleshooting

On the management cluster

  • Check the machines and openstackmachines CRDs match the VMs in the web interface

    • kubectl get machines -A and kubectl get openstackmachines -A

    • Check the control plane node’s status kubectl describe machine <name> -n clusters

  • Logs are available if nothing is happening / the process is stuck

    • OpenStack logs: kubectl logs deploy/capo-controller-manager -n capo-system -f

    • CAPI logs: kubectl logs deploy/capi-controller-manager -n capi-system -f

  • Check the control plane status:

    • kubectl describe kcp/<name>-control-plane -n clusters

    • Check for events on the management cluster: kubectl get events -n clusters --watch

On the target cluster

  • Check you have access via kubectl

    • This could indicate an OpenStack networking configuration problem if you do not

    • Check the LBs and networks exist - if not check the CAPO logs on the management cluster

  • Check etcd is healthy with kubectl get pods -n kube-system:

    • If they’re failing to start kubectl describe pod/etcd-<name> -n kube-system

    • If they’re running check they’re healthy with kubectl logs pod/etcd-<name> -n kube-system

    • In the event etcd is unhealthy contact the cloud team to assist with recovery

  • Check the kubeapi pod is starting per machine

    • If they’re failing to start kubectl describe pod/kubeapi-<name> -n kube-system

    • If they’re running check they’re healthy with kubectl logs pod/kubeapi-<name> -n kube-system

    • In the event the Kubelet is failing to start or is unhealthy contact the cloud to assist with recovery