Scaling Kubernetes clusters with Rancher and Terraform

In this article we will walk through creating complete infrastructure pieces on Bluvalt OpenStack that are needed to have a fully provisioned Kubernetes cluster using Terraform and Rancher2. In addition to integration with cloud-provider-openstack and cinder-csi-plugin

Getting started with Infrastructure

  • Clone the repository terraform-rancher2 into a folder.
  • Go into the openstack folder using cd openstack/
  • Modify the variables in terraform.tfvars to match your current cloud environment. for RUH2 cloud, use openstack_auth_url = "https://api-ruh2-vdc.bluvalt.com/identity/v3" and openstack_domain = "ruh2" for JED1 cloud, use openstack_auth_url = "https://api-jed1-vdc.bluvalt.com/identity/v3" and openstack_domain = "jed1"
    it is important to uncomment the vars openstack_project , openstack_username and openstack_password or export them as env variables with prefix TF_VAR_* for example:
export TF_VAR_openstack_username=myusername
export TF_VAR_openstack_password=mypassword
export TF_VAR_openstack_project=myproject
  • Other variables can be obtained from openstack-cli such as rancher_node_image_id , external_network and flavors by invoking
## image list .. pick an ubuntu image
openstack image list
## network name
openstack network list --external
## flavors
openstack flavor list
  • RKE configuration can be adjusted and customized in rancher2.tf, you can check the provider documentation at rancher_cluster NOTE: It is really important to keep kubelet extra_args for the external cloudprovider in order to integrate with cloud-provider-openstack

  • Run terraform init to initialize a working directory containing Terraform configuration files.

  • To apply the creation of the environment, Run terraform apply --auto-approve and wait for the output after all resources finish the creation

Apply complete! Resources: 25 added, 0 changed, 0 destroyed.

Outputs:

rancher_url = [
  "https://xx.xx.xx.xx/",
]

Up to this point, use the rancher_url from above output and login to rancher instance with username admin and password defined in rancher_admin_password. Wait for all kubernetes nodes to be discovered, registered, and active.

Integration with cloud-provider-openstack

As you may notice, that all the nodes have a taint node.cloudprovider.kubernetes.io/uninitialized. The usage of --cloud-provider=external flag to the kubelet makes it waiting for the clouder-provider to start the initialization. This marks the node as needing a second initialization from an external controller before it can be scheduled work.

  • Edit the file manifests/cloud-config with the access information to your openstack environment.
  • Create a secret containing the cloud configuration in the kube-system namespace
kubectl create secret -n kube-system generic cloud-config --from-file=manifests/cloud-config
  • Create RBAC resources and openstack-cloud-controller-manager deamonset and wait for all the pods in kube-system namespace up and running.
kubectl apply -f manifests/cloud-controller-manager-roles.yaml
kubectl apply -f manifests/cloud-controller-manager-role-bindings.yaml
kubectl apply -f manifests/openstack-cloud-controller-manager-ds.yaml
  • Create cinder-csi-plugin which are a set of cluster roles, cluster role bindings, statefulsets, and storageClass to communicate with openstack(cinder).
kubectl apply -f manifests/cinder-csi-plugin.yaml

Up to this point, openstack-cloud-controller-manager and cinder-csi-plugin have been deployed, and they’re able to obtain valuable information such as External IP addresses and Zone info.

$ kubectl get nodes -o wide

NAME            STATUS   ROLES               AGE     VERSION   INTERNAL-IP     EXTERNAL-IP      OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
demo-master-1   Ready    controlplane,etcd   5h      v1.17.5   192.168.201.6   xx.xx.xx.xx      Ubuntu 18.04.2 LTS   4.15.0-45-generic   docker://19.3.9
demo-worker-1   Ready    worker              4h57m   v1.17.5   192.168.201.4   xx.xx.xx.xx      Ubuntu 18.04.2 LTS   4.15.0-45-generic   docker://19.3.9
demo-worker-2   Ready    worker              4h56m   v1.17.5   192.168.201.5   xx.xx.xx.xx      Ubuntu 18.04.2 LTS   4.15.0-45-generic   docker://19.3.9

cluster-overview

Also, as shown in the nodes tab, All nodes are active and labeled by openstack zones.

node-details

Scalability

When it comes to scalability with IaC (infrastructure-as-code), it becomes so easy to obtain any desired state in less consumed time and effort. All you have to do is to change the number of nodes count_master or count_worker_nodes and run terraform apply again For example, let’s increase the number of count_worker_nodes by 1 A few minutes later, after refreshing states and applying updates:


Apply complete! Resources: 3 added, 0 changed, 0 destroyed.

Outputs:

rancher_url = [
  "https://xx.xx.xx.xx",
]

Couple of minutes for the new node to be registered

$ kubectl get nodes -o wide
NAME            STATUS   ROLES               AGE    VERSION   INTERNAL-IP     EXTERNAL-IP      OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
demo-master-1   Ready    controlplane,etcd   28h    v1.17.5   192.168.201.6   xx.xx.xx.xx      Ubuntu 18.04.2 LTS   4.15.0-45-generic   docker://19.3.9
demo-worker-1   Ready    worker              28h    v1.17.5   192.168.201.4   xx.xx.xx.xx      Ubuntu 18.04.2 LTS   4.15.0-45-generic   docker://19.3.9
demo-worker-2   Ready    worker              28h    v1.17.5   192.168.201.5   xx.xx.xx.xx      Ubuntu 18.04.2 LTS   4.15.0-45-generic   docker://19.3.9
demo-worker-3   Ready    worker              2m2s   v1.17.5   192.168.201.7   xx.xx.xx.xx      Ubuntu 18.04.2 LTS   4.15.0-45-generic   docker://19.3.9

NOTE: Scaling down the cluster could be made by decreasing the number of nodes in terrafrom.tfvars. Node gets deleted, moreover cloud-provider-openstack detects that and removes it from the cluster

Cleaning up

To clean up all resources created by this terraform, Just run terraform destroy

References