Kubernetes with vSphere

Here I will describe how to install Kubernetes manually and use the vSphere storage for dynamic provisioning of volumes. The good thing is also that as soon as the claim is removed the vmdk could also be removed from the cluster. No ghost storage is left behind.

Warning: my blog removes all backslashes at the end of the line and so you'll have to add them yourself... Sorrie

vSphere

Machines started to be used by Kubernetes and automatic provisioning need a plugin and a flag to be zet. The plugin must be installed on the machine running the vSphere node and the flag needs to be set on the virtual-machine level. "VM Options -> Edit..." with "VM Options -> Advanced -> Edit Configuration". There "Add Row" and add "disk.EnableUUID" with value "TRUE". The machine needs to be off to be able to change this value. See /dev/disk/by-uuid/ to check the uuid's.

Or use the govc tool

export GOVC_URL=<IP/URL>
export GOVC_USERNAME=<vCenter User>
export GOVC_PASSWORD=<vCenter Password>
export GOVC_INSECURE=1
govc vm.change -e="disk.enableUUID=1" -vm=<VMNAME>

etcd

etcd is used by flannel to store the configuration and the subnets. etcd must run natively on a couple of machines in the cluster. A minimum of three is reasonable but it also functions with just one instance. To view and manipulate what is in etcd the etcdctl program is very useful.

yum install etcd
systemctl enable etcd
service etcd start

Kubernetes api server stores the configurations in etcd with api version 3. This means that etcdctl ls does not show anything. Instead use ETCDCTL_API=3 etcdctl get "" --prefix=true --keys-only to see the keys in the store.

docker

Most of the parts of the Kubernetes cluster are run within Docker.

yum install docker-engine-1.13.1
systemctl enable docker
service docker start

Warning: kubelet version 1.6.4 has problems with newer docker versions because of the version numbering. The numbering changed from a notation like 1.13 to 17.03 and the parsing of version number in Kubelet does not support this.

/etc/systemd/system/docker.service

[Service]
ExecStart=/usr/bin/dockerd $DOCKER_NETWORK_OPTIONS

/etc/systemd/system/docker.service.d/http_proxy.conf

[Service]
Environment="HTTP_PROXY=http://proxy.xxx.xx:8080/"
Environment="HTTPS_PROXY=http://proxy.xxx.xx:8080/"
Environment="NO_PROXY=localhost,127.0.0.1,<list of instances in cluster>"

/etc/systemd/system/docker.service.d/network.conf

[Service]
Environment="DOCKER_NETWORK_OPTIONS=--bip=10.233.126.1/24 --iptables=true"

The bip address must be an address in the flannel network and so must be changed accordingly after the flannel network has received the lease.

hyperkude

hyperkube is an all in one binary that is used to start all the fundamental parts of the Kubernetes cluster. For example "hyperkube kubelet ..." starts the kubelet. There is no need to install this natively because we start it in a docker container.

There are some useful tools in the git repo https://github.com/kubernetes/kubernetes.git. For example cluster/saltbase/salt/generate-cert/make-ca-cert.sh.

kubelet

kubelet is the first thing started to bootstrap the Kubernetes cluster. I start this as a service. During the start all kubernetes configs in the manifest folder are started.

On the master there need to be:

apiserver.manifest
controller-manager.manifest
flannel.manifest
proxy.manifest
scheduler.manifest

On the node only the following should be started:

flannel.manifest
proxy.manifest

/usr/local/bin/kubelet

#!/bin/bash
/usr/bin/docker run 
  --net=host 
  --pid=host 
  --privileged 
  --name=kubelet 
  --restart=on-failure:5 
  --memory=512M 
  --cpu-shares=100 
  -v /dev:/dev:rw 
  -v /etc/cni:/etc/cni:ro 
  -v /opt/cni:/opt/cni:ro 
  -v /etc/ssl:/etc/ssl:ro 
  -v /etc/resolv.conf:/etc/resolv.conf 
  -v /etc/pki/tls:/etc/pki/tls:ro 
  -v /etc/pki/ca-trust:/etc/pki/ca-trust:ro 
  -v /sys:/sys:ro 
  -v /var/lib/docker:/var/lib/docker:rw 
  -v /var/log:/var/log:rw 
  -v /var/lib/kubelet:/var/lib/kubelet:shared 
  -v /var/lib/cni:/var/lib/cni:shared 
  -v /var/run:/var/run:rw 
  -v /etc/kubernetes:/etc/kubernetes:ro 
  -v /etc/os-release:/etc/os-release:ro 
  quay.io/coreos/hyperkube:v1.6.6_coreos.1 
  ./hyperkube kubelet 
  "$@"

/etc/systemd/system/kubelet.service

[Unit]
Description=Kubernetes Kubelet Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Wants=docker.socket

[Service]
EnvironmentFile=/etc/kubernetes/kubelet.env
ExecStart=/usr/local/bin/kubelet 
                $KUBE_LOGTOSTDERR 
                $KUBE_LOG_LEVEL 
                $KUBELET_API_SERVER 
                $KUBELET_ADDRESS 
                $KUBELET_PORT 
                $KUBELET_HOSTNAME 
                $KUBE_ALLOW_PRIV 
                $KUBELET_ARGS 
                $DOCKER_SOCKET 
                $KUBELET_NETWORK_PLUGIN 
                $KUBELET_CLOUDPROVIDER
Restart=always
RestartSec=10s
ExecStartPre=-/usr/bin/docker rm -f kubelet
ExecReload=/usr/bin/docker restart kubelet


[Install]
WantedBy=multi-user.target

/etc/kubernetes/kubelet.env

# logging to stderr means we get it in the systemd journal
KUBE_LOGGING="--logtostderr=true"
KUBE_LOG_LEVEL="--v=5"

# The address for the info server to serve on (set to 0.0.0.0 or "" for all interfaces)
KUBELET_ADDRESS="--address=<physical ip address of instance>"

# The port for the info server to serve on
# KUBELET_PORT="--port=10250"

# You may leave this blank to use the actual hostname
KUBELET_HOSTNAME="--hostname-override=<short name of host>"

KUBELET_ARGS="--pod-manifest-path=/etc/kubernetes/manifests 
--pod-infra-container-image=gcr.io/google_containers/pause-amd64:3.0 
--kube-reserved cpu=100m,memory=512M 
--node-status-update-frequency=10s 
--enable-cri=False --cgroups-per-qos=False --enforce-node-allocatable='' 
--cluster-dns=10.233.0.3 --cluster-domain=cluster.tst.local --resolv-conf=/etc/resolv.conf --kubeconfig=/etc/kubernetes/node-kubeconfig.yaml --require-kubeconfig --node-labels=node-role.kubernetes.io/node=true 
"

# Should this cluster be allowed to run privileged docker containers
KUBE_ALLOW_PRIV="--allow-privileged=true"

KUBELET_CLOUDPROVIDER="--cloud-provider=vsphere --cloud-config=/etc/kubernetes/vsphere.conf"

/etc/kubernetes/node-kubeconfig.yaml

apiVersion: v1
clusters:
- name: testcluster
  cluster:
    insecure-skip-tls-verify: true
    server: http://<physical ip address of instance>:8080
contexts:
- name: testcluster_context
  context:
    cluster: testcluster
    user: kubelet
current-context: testcluster_context
kind: Config
preferences: {}
users: []

/etc/kubernetes/vsphere.conf

[Global]
        user="<vSphere user>"
        password="<vSphere password>"
        server="<vsphere server>"
        port="443"
        insecure-flag="1"
        # Datacenter in which VMs are located
        datacenter="<Datacenter>"
        # Datastore in which vmdks are stored
        datastore="<datastore>"
        # WorkingDir is path where VMs can be found
        working-dir="linux"
        # VMName is the VM name of virtual machine. Combining the WorkingDir and VMName can form a unique InstanceID. When vm-name is set, no username/password is required on worker nodes.
        #vm-name="<name>"
[Disk]
        scsicontrollertype=pvscsi

The manual states that working-dir and vm-name are valid but this only seems true for very new release of kubelet. As soon as kubelet starts it reads the vspeher.conf file and checks if vm-name is set. If not it will read the uuid of the vm in /sys/class/dmi/id/product_serial and search for that uuid through the rest api of vSphere.

Manifests

/etc/kubernetes/manifests/flannel.manifest

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: flannel
  name: flannel
spec:
  volumes:
  - name: "subnetenv"
    hostPath:
      path: "/run/flannel"
  - name: "etckube"
    hostPath:
      path: "/etc/kubernetes/"
  containers:
  - name: "flannel-server-helper"
    image: "gcr.io/google_containers/flannel-server-helper:0.1"
    args:
    - "--network-config=/etc/kubernetes/flannel-network.json"
    - "--etcd-prefix=/ah.online/network"
    - "--etcd-server=http://<ip address of master>:2379"
    volumeMounts:
    - name: "etckube"
      mountPath: "/etc/kubernetes"
    imagePullPolicy: "Always"
  - image: quay.io/coreos/flannel:v0.7.1-amd64
    name: flannel
    command:
    - /opt/bin/flanneld
    - -etcd-endpoints
    - http://<ip address of master>:2379
    - -etcd-prefix
    - /ah.online/network
    - -public-ip
    - <physical ip address of instance>
    - -v=2
    volumeMounts:
    - name: "subnetenv"
      mountPath: "/run/flannel"
    securityContext:
      privileged: true
  hostNetwork: true

/etc/kubernetes/manifests/apiserver.manifest

apiVersion: v1
kind: Pod
metadata:
  name: kube-apiserver
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-apiserver
    image: quay.io/coreos/hyperkube:v1.6.4_coreos.0
    command:
    - /hyperkube
    - apiserver
    - --advertise-address=<physical ip address of instance>
    - --etcd-servers=http://<physical ip address of instance>:2379
    - --etcd-quorum-read=true
    - --insecure-bind-address=<physical ip address of instance>
    - --apiserver-count=2
    - --admission-control=NamespaceLifecycle,NamespaceExists,LimitRanger,ServiceAccount,ResourceQuota
    - --service-cluster-ip-range=10.233.0.0/18
    - --service-node-port-range=30000-32767
    - --tls-cert-file=/etc/kubernetes/ssl/server.cert
    - --tls-private-key-file=/etc/kubernetes/ssl/server.key
    - --client-ca-file=/etc/kubernetes/ssl/ca.crt
    - --token-auth-file=/etc/kubernetes/tokens/known_tokens.csv
    - --basic-auth-file=/etc/kubernetes/users/known_users.csv
    - --secure-port=6443
    - --insecure-port=8080
    - --storage-backend=etcd3
    - --cloud-provider=vsphere
    - --cloud-config=/etc/kubernetes/vsphere.conf
    - --v=5
    - --allow-privileged=true
    - --anonymous-auth=true
    - 2>&1 >> /var/log/kube-apiserver.log
    volumeMounts:
    - mountPath: /etc/kubernetes
      name: etckube
      readOnly: true
    - mountPath: /etc/ssl
      name: etcssl
      readOnly: true
    - mountPath: /var/log/
      name: varlog
  volumes:
  - hostPath:
      path: /etc/kubernetes
    name: etckube
  - hostPath:
      path: /etc/ssl
    name: etcssl
  - hostPath:
      path: /var/log/
    name: varlog

/etc/kubernetes/manifests/controller-manager.manifest

apiVersion: v1
kind: Pod
metadata:
  name: kube-controller-manager
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-controller-manager
    image: quay.io/coreos/hyperkube:v1.6.4_coreos.0
    command:
    - /hyperkube
    - controller-manager
    - --master=http://<ip address of master>:8080
    - --leader-elect=true
    - --service-account-private-key-file=/etc/kubernetes/ssl/server.key
    - --root-ca-file=/etc/kubernetes/ssl/ca.crt
    - --enable-hostpath-provisioner=false
    - --cloud-provider=vsphere
    - --cloud-config=/etc/kubernetes/vsphere.conf
    - --v=5
    livenessProbe:
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10252
      initialDelaySeconds: 30
      timeoutSeconds: 10
    volumeMounts:
    - mountPath: /etc/kubernetes
      name: etc-kube
      readOnly: true
  volumes:
  - hostPath:
      path: /etc/kubernetes
    name: etc-kube

/etc/kubernetes/manifests/proxy.manifest

apiVersion: v1
kind: Pod
metadata:
  name: kube-proxy
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-proxy
    image: quay.io/coreos/hyperkube:v1.6.4_coreos.0
    command:
    - /hyperkube
    - proxy
    - --v=2
    - --master=http://<ip address of master>:8080
    - --bind-address=<physical ip address of instance>
    - --cluster-cidr=10.233.64.0/18
    - --proxy-mode=iptables
    securityContext:
      privileged: true

/etc/kubernetes/manifests/scheduler.manifest

apiVersion: v1
kind: Pod
metadata:
  name: kube-scheduler
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-scheduler
    image: quay.io/coreos/hyperkube:v1.6.4_coreos.0
    command:
    - /hyperkube
    - scheduler
    - --leader-elect=true
    - --master=http://<ip address of master>:8080
    - --v=2
    livenessProbe:
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10251
      initialDelaySeconds: 30
      timeoutSeconds: 10

Dashboard

kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  labels:
    app: kubernetes-dashboard
  name: kubernetes-dashboard
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kubernetes-dashboard
  template:
    metadata:
      labels:
        app: kubernetes-dashboard
      # Comment the following annotation if Dashboard must not be deployed on master
      annotations:
        scheduler.alpha.kubernetes.io/tolerations: |
          [
            {
              "key": "dedicated",
              "operator": "Equal",
              "value": "master",
              "effect": "NoSchedule"
            }
          ]
    spec:
      containers:
      - name: kubernetes-dashboard
        image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.1
        imagePullPolicy: Always
        ports:
        - containerPort: 9090
          protocol: TCP
        args:
          - --apiserver-host=http://<ip address of master>:8080
---
kind: Service
apiVersion: v1
metadata:
  labels:
    app: kubernetes-dashboard
  name: kubernetes-dashboard
  namespace: kube-system
spec:
  type: NodePort
  ports:
  - port: 80
    targetPort: 9090
  selector:
    app: kubernetes-dashboard

Examples

Storage class

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: fast
provisioner: kubernetes.io/vsphere-volume
parameters:
    diskformat: zeroedthick

Link to existing vmdk

apiVersion: v1
kind: Pod
metadata:
  name: pod0001
spec:
  containers:
  - name: pod0001
    image: busybox
    command:
    - sleep
    - "3600"
    volumeMounts:
    - mountPath: /data
      name: pod-volume
  volumes:
  - name: pod-volume
    vsphereVolume:
      volumePath: "[<datastore>] kubevols/MyVolume.vmdk"

Dynamic provisioning

The created vmdk gets a name containing the pvc-volume id in Kubernetes. It is placed under kubevols.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc0002
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi
  storageClassName: fast
---
apiVersion: v1
kind: Pod
metadata:
  name: pvcpod2
spec:
  containers:
  - name: busybox
    image: busybox
    command:
    - sleep
    - "3600"
    volumeMounts:
    - name: test-volume
      mountPath: /test-vmdk
  volumes:
  - name: test-volume
    persistentVolumeClaim:
      claimName: pvc0002

Trouble shooting

flannel

The ip address of the docker network must be in the subnet of the flannel network on that same host.

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:89:13:3f brd ff:ff:ff:ff:ff:ff
    inet xx.xx.xx.xx/24 brd 141.93.123.255 scope global ens192
       valid_lft forever preferred_lft forever
5: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether 02:42:7b:7f:0e:02 brd ff:ff:ff:ff:ff:ff
    inet 10.233.126.1/24 scope global docker0
       valid_lft forever preferred_lft forever
6: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
    link/ether 3e:92:7c:6e:b5:b5 brd ff:ff:ff:ff:ff:ff
    inet 10.233.126.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever

cleanup

When experimenting it could happen that some settings are not matching anymore. Cleaning the system could then help.

service kubelet stop
docker stop $(docker ps -a -q)
docker rm $(docker ps -a -q)
service docker stop

ip link del docker0
ip link del flannel.1

iptables -F
iptables -X
iptables -t nat -F
iptables -t nat -X
iptables -t mangle -F
iptables -t mangle -X
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT

service docker start
service kubelet start

Errors

Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"pod", UID:"850c43f8-a41b-11e7-b6fa-0050568e6653", APIVersion:"v1", ResourceVersion:"262066", FieldPath:""}): type: 'Warning' reason: 'FailedMount' Failed to attach volume "pvc-84ed0126-a41b-11e7-b6fa-0050568e6653" on node "blah1234" with: vm 'blah1234' not found
Use govc to find the node and note the path to the node. It looks like "vm/linux/fld-rue-local/blah1234". This means that "linux/fld-rue-local" must be added to "working-dir" in the vsphere.conf.