Skip to main content

Upgrading an EKS Cluster

Introduction

EKS clusters can be upgraded directly from the AWS cluster to upgrade the master.  Below is some information about how to go about making those changes to your environment and some considerations to make before doing so

Prerequisites

Before updating the Control plane, check the version of your Kubernetes cluster and worker node.  If your worker node version is older than your current Kubernetes version, then you must update your worker node first then only proceed with updating your control plane. Check Kubernetes Control Plane version and worker node from the following command: kubectl version --short kubectl get nodes This will give you the node(s) and their version. Since Pod Security Policy(PSP) admission controller is enabled by default from 1.13 and later versions of Kubernetes, we need to make sure that proper pod security policy is in place, before updating the Kubernetes version on the Control Plane. Check the default security policy using the command below: kubectl get psp eks.privileged If you get any server error, then you must install PSP. Make sure to read and understand the following AWS documentation on best practices:Planning Kubernetes Upgrades with Amazon EKSAmazon EKS - Updating a Cluster Please also note that there may also be documented changes from Kubernetes to be aware of.  For example, the following documentation from Kubernetes outlines a potential issue for those upgrading to v1.21 when using HashiCorp Vaulthttps://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-issuer-discovery Note: When the master goes down for the upgrade, deployments, services, etc. continue to work as expected. However, anything that requires the Kubernetes API stops working. This means kubectl stops working, applications that use the Kubernetes API to get information about the cluster stop working, and basically, you can’t make any changes to the cluster while it is being upgraded.  

Instructions

Upgrade Master

This can be achieved directly from AWS console, under EKS section. There’s a Upgrade cluster version button that can be used to upgrade the master. If you are using Terraform scripts for managing your cluster, the version can be specified in aws_eks_cluster resource:

resource "aws_eks_cluster" "aws-eks" {
name = "${var.cluster-name}"
role_arn = "${aws_iam_role.aws-eks-cluster.arn}"

version = "${var.cluster-master-version}"

vpc_config {
security_group_ids = ["${aws_security_group.aws-eks-cluster.id}"]
subnet_ids = ["${aws_subnet.aws-eks-subnet-1.id}", "${aws_subnet.aws-eks-subnet-2.id}"]
}

depends_on = [
"aws_iam_role_policy_attachment.aws-eks-cluster-AmazonEKSClusterPolicy",
"aws_iam_role_policy_attachment.aws-eks-cluster-AmazonEKSServicePolicy",
"aws_iam_role.aws-eks-cluster"
]
}

Effect: Doing this will only upgrade the master, but EKS worker nodes will not be affected and will continue running normally.

Upgrade EKS Worker Nodes

In this step you actually change the AMI used for the worker nodes in EKS launch configuration, and then double the number of nodes in the cluster in their Auto Scaling Group configuration. If the original cluster had 3 nodes, you increase that to 6 nodes: If you are using Terraform scripts, these settings are available under image_id of aws_launch_configuration resource:

resource "aws_launch_configuration" "aws-eks" {
associate_public_ip_address = false
iam_instance_profile = "${aws_iam_instance_profile.aws-eks-node.name}"
image_id = "${data.aws_ami.eks-worker.id}"
instance_type = "${var.ec2-instance-type}"
name_prefix = "${var.cluster-name}-node"
security_groups = ["${aws_security_group.aws-eks-node.id}"]
user_data_base64 = "${base64encode(local.aws-eks-node-userdata)}"

lifecycle {
create_before_destroy = true
}

root_block_device {
volume_size = "50"
}
}

And Auto Scaling Group size is available under max_size and desired_capacity of aws_autoscaling_group resource:

resource "aws_autoscaling_group" "aws-eks" {
desired_capacity = "${var.desired-ec2-instances}"
launch_configuration = "${aws_launch_configuration.aws-eks.id}"
max_size = "${var.max-ec2-instances}"
min_size = "${var.min-ec2-instances}"
name = "${var.cluster-name}-asg"
vpc_zone_identifier = ["${aws_subnet.aws-eks-subnet-1.id}", "${aws_subnet.aws-eks-subnet-2.id}"]

tag {
key = "Name"
value = "${var.cluster-name}"
propagate_at_launch = true
}

tag {
key = "kubernetes.io/cluster/${var.cluster-name}"
value = "owned"
propagate_at_launch = true
}

depends_on = ["aws_eks_cluster.aws-eks"]

}

Effect: Cluster size will be doubled, new worker nodes will be created with the new version, and old worker nodes will still be running and available with the old version.  

Scale Down Auto Scaling Group

After the new nodes are correctly registered and available in kubernetes, it’s time to decrease the Auto Scaling Group to its original size. Effect: All pods in the old nodes will be automatically moved to the new nodes by kubernetes. After Spinnaker pods are restarted, you should be able to continue using Spinnaker as usual.