Kubernetes with vSphere
Here I will describe how to install Kubernetes manually and use the vSphere storage for dynamic provisioning of volumes. The good thing is also that as soon as the claim is removed the vmdk could also be removed from the cluster. No ghost storage is left behind.
Warning: my blog removes all backslashes at the end of the line and so you'll have to add them yourself... Sorrie
vSphere
Machines started to be used by Kubernetes and automatic provisioning need a plugin and a flag to be zet. The plugin must be installed on the machine running the vSphere node and the flag needs to be set on the virtual-machine level. "VM Options -> Edit..." with "VM Options -> Advanced -> Edit Configuration". There "Add Row" and add "disk.EnableUUID" with value "TRUE". The machine needs to be off to be able to change this value. See /dev/disk/by-uuid/
to check the uuid's.
Or use the govc tool
export GOVC_URL=<IP/URL>
export GOVC_USERNAME=<vCenter User>
export GOVC_PASSWORD=<vCenter Password>
export GOVC_INSECURE=1
govc vm.change -e="disk.enableUUID=1" -vm=<VMNAME>
etcd
etcd is used by flannel to store the configuration and the subnets. etcd must run natively on a couple of machines in the cluster. A minimum of three is reasonable but it also functions with just one instance. To view and manipulate what is in etcd the etcdctl
program is very useful.
yum install etcd
systemctl enable etcd
service etcd start
Kubernetes api server stores the configurations in etcd with api version 3. This means that etcdctl ls
does not show anything. Instead use ETCDCTL_API=3 etcdctl get "" --prefix=true --keys-only
to see the keys in the store.
docker
Most of the parts of the Kubernetes cluster are run within Docker.
yum install docker-engine-1.13.1
systemctl enable docker
service docker start
Warning: kubelet version 1.6.4 has problems with newer docker versions because of the version numbering. The numbering changed from a notation like 1.13 to 17.03 and the parsing of version number in Kubelet does not support this.
/etc/systemd/system/docker.service
[Service]
ExecStart=/usr/bin/dockerd $DOCKER_NETWORK_OPTIONS
/etc/systemd/system/docker.service.d/http_proxy.conf
[Service]
Environment="HTTP_PROXY=http://proxy.xxx.xx:8080/"
Environment="HTTPS_PROXY=http://proxy.xxx.xx:8080/"
Environment="NO_PROXY=localhost,127.0.0.1,<list of instances in cluster>"
/etc/systemd/system/docker.service.d/network.conf
[Service]
Environment="DOCKER_NETWORK_OPTIONS=--bip=10.233.126.1/24 --iptables=true"
The bip address must be an address in the flannel network and so must be changed accordingly after the flannel network has received the lease.
hyperkude
hyperkube is an all in one binary that is used to start all the fundamental parts of the Kubernetes cluster. For example "hyperkube kubelet ..." starts the kubelet. There is no need to install this natively because we start it in a docker container.
There are some useful tools in the git repo https://github.com/kubernetes/kubernetes.git. For example cluster/saltbase/salt/generate-cert/make-ca-cert.sh
.
kubelet
kubelet is the first thing started to bootstrap the Kubernetes cluster. I start this as a service. During the start all kubernetes configs in the manifest folder are started.
On the master there need to be:
- apiserver.manifest
- controller-manager.manifest
- flannel.manifest
- proxy.manifest
- scheduler.manifest
On the node only the following should be started:
- flannel.manifest
- proxy.manifest
/usr/local/bin/kubelet
#!/bin/bash
/usr/bin/docker run
--net=host
--pid=host
--privileged
--name=kubelet
--restart=on-failure:5
--memory=512M
--cpu-shares=100
-v /dev:/dev:rw
-v /etc/cni:/etc/cni:ro
-v /opt/cni:/opt/cni:ro
-v /etc/ssl:/etc/ssl:ro
-v /etc/resolv.conf:/etc/resolv.conf
-v /etc/pki/tls:/etc/pki/tls:ro
-v /etc/pki/ca-trust:/etc/pki/ca-trust:ro
-v /sys:/sys:ro
-v /var/lib/docker:/var/lib/docker:rw
-v /var/log:/var/log:rw
-v /var/lib/kubelet:/var/lib/kubelet:shared
-v /var/lib/cni:/var/lib/cni:shared
-v /var/run:/var/run:rw
-v /etc/kubernetes:/etc/kubernetes:ro
-v /etc/os-release:/etc/os-release:ro
quay.io/coreos/hyperkube:v1.6.6_coreos.1
./hyperkube kubelet
"$@"
/etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Wants=docker.socket
[Service]
EnvironmentFile=/etc/kubernetes/kubelet.env
ExecStart=/usr/local/bin/kubelet
$KUBE_LOGTOSTDERR
$KUBE_LOG_LEVEL
$KUBELET_API_SERVER
$KUBELET_ADDRESS
$KUBELET_PORT
$KUBELET_HOSTNAME
$KUBE_ALLOW_PRIV
$KUBELET_ARGS
$DOCKER_SOCKET
$KUBELET_NETWORK_PLUGIN
$KUBELET_CLOUDPROVIDER
Restart=always
RestartSec=10s
ExecStartPre=-/usr/bin/docker rm -f kubelet
ExecReload=/usr/bin/docker restart kubelet
[Install]
WantedBy=multi-user.target
/etc/kubernetes/kubelet.env
# logging to stderr means we get it in the systemd journal
KUBE_LOGGING="--logtostderr=true"
KUBE_LOG_LEVEL="--v=5"
# The address for the info server to serve on (set to 0.0.0.0 or "" for all interfaces)
KUBELET_ADDRESS="--address=<physical ip address of instance>"
# The port for the info server to serve on
# KUBELET_PORT="--port=10250"
# You may leave this blank to use the actual hostname
KUBELET_HOSTNAME="--hostname-override=<short name of host>"
KUBELET_ARGS="--pod-manifest-path=/etc/kubernetes/manifests
--pod-infra-container-image=gcr.io/google_containers/pause-amd64:3.0
--kube-reserved cpu=100m,memory=512M
--node-status-update-frequency=10s
--enable-cri=False --cgroups-per-qos=False --enforce-node-allocatable=''
--cluster-dns=10.233.0.3 --cluster-domain=cluster.tst.local --resolv-conf=/etc/resolv.conf --kubeconfig=/etc/kubernetes/node-kubeconfig.yaml --require-kubeconfig --node-labels=node-role.kubernetes.io/node=true
"
# Should this cluster be allowed to run privileged docker containers
KUBE_ALLOW_PRIV="--allow-privileged=true"
KUBELET_CLOUDPROVIDER="--cloud-provider=vsphere --cloud-config=/etc/kubernetes/vsphere.conf"
/etc/kubernetes/node-kubeconfig.yaml
apiVersion: v1
clusters:
- name: testcluster
cluster:
insecure-skip-tls-verify: true
server: http://<physical ip address of instance>:8080
contexts:
- name: testcluster_context
context:
cluster: testcluster
user: kubelet
current-context: testcluster_context
kind: Config
preferences: {}
users: []
/etc/kubernetes/vsphere.conf
[Global]
user="<vSphere user>"
password="<vSphere password>"
server="<vsphere server>"
port="443"
insecure-flag="1"
# Datacenter in which VMs are located
datacenter="<Datacenter>"
# Datastore in which vmdks are stored
datastore="<datastore>"
# WorkingDir is path where VMs can be found
working-dir="linux"
# VMName is the VM name of virtual machine. Combining the WorkingDir and VMName can form a unique InstanceID. When vm-name is set, no username/password is required on worker nodes.
#vm-name="<name>"
[Disk]
scsicontrollertype=pvscsi
The manual states that working-dir and vm-name are valid but this only seems true for very new release of kubelet. As soon as kubelet starts it reads the vspeher.conf file and checks if vm-name is set. If not it will read the uuid of the vm in /sys/class/dmi/id/product_serial
and search for that uuid through the rest api of vSphere.
Manifests
/etc/kubernetes/manifests/flannel.manifest
apiVersion: v1
kind: Pod
metadata:
labels:
app: flannel
name: flannel
spec:
volumes:
- name: "subnetenv"
hostPath:
path: "/run/flannel"
- name: "etckube"
hostPath:
path: "/etc/kubernetes/"
containers:
- name: "flannel-server-helper"
image: "gcr.io/google_containers/flannel-server-helper:0.1"
args:
- "--network-config=/etc/kubernetes/flannel-network.json"
- "--etcd-prefix=/ah.online/network"
- "--etcd-server=http://<ip address of master>:2379"
volumeMounts:
- name: "etckube"
mountPath: "/etc/kubernetes"
imagePullPolicy: "Always"
- image: quay.io/coreos/flannel:v0.7.1-amd64
name: flannel
command:
- /opt/bin/flanneld
- -etcd-endpoints
- http://<ip address of master>:2379
- -etcd-prefix
- /ah.online/network
- -public-ip
- <physical ip address of instance>
- -v=2
volumeMounts:
- name: "subnetenv"
mountPath: "/run/flannel"
securityContext:
privileged: true
hostNetwork: true
/etc/kubernetes/manifests/apiserver.manifest
apiVersion: v1
kind: Pod
metadata:
name: kube-apiserver
namespace: kube-system
spec:
hostNetwork: true
containers:
- name: kube-apiserver
image: quay.io/coreos/hyperkube:v1.6.4_coreos.0
command:
- /hyperkube
- apiserver
- --advertise-address=<physical ip address of instance>
- --etcd-servers=http://<physical ip address of instance>:2379
- --etcd-quorum-read=true
- --insecure-bind-address=<physical ip address of instance>
- --apiserver-count=2
- --admission-control=NamespaceLifecycle,NamespaceExists,LimitRanger,ServiceAccount,ResourceQuota
- --service-cluster-ip-range=10.233.0.0/18
- --service-node-port-range=30000-32767
- --tls-cert-file=/etc/kubernetes/ssl/server.cert
- --tls-private-key-file=/etc/kubernetes/ssl/server.key
- --client-ca-file=/etc/kubernetes/ssl/ca.crt
- --token-auth-file=/etc/kubernetes/tokens/known_tokens.csv
- --basic-auth-file=/etc/kubernetes/users/known_users.csv
- --secure-port=6443
- --insecure-port=8080
- --storage-backend=etcd3
- --cloud-provider=vsphere
- --cloud-config=/etc/kubernetes/vsphere.conf
- --v=5
- --allow-privileged=true
- --anonymous-auth=true
- 2>&1 >> /var/log/kube-apiserver.log
volumeMounts:
- mountPath: /etc/kubernetes
name: etckube
readOnly: true
- mountPath: /etc/ssl
name: etcssl
readOnly: true
- mountPath: /var/log/
name: varlog
volumes:
- hostPath:
path: /etc/kubernetes
name: etckube
- hostPath:
path: /etc/ssl
name: etcssl
- hostPath:
path: /var/log/
name: varlog
/etc/kubernetes/manifests/controller-manager.manifest
apiVersion: v1
kind: Pod
metadata:
name: kube-controller-manager
namespace: kube-system
spec:
hostNetwork: true
containers:
- name: kube-controller-manager
image: quay.io/coreos/hyperkube:v1.6.4_coreos.0
command:
- /hyperkube
- controller-manager
- --master=http://<ip address of master>:8080
- --leader-elect=true
- --service-account-private-key-file=/etc/kubernetes/ssl/server.key
- --root-ca-file=/etc/kubernetes/ssl/ca.crt
- --enable-hostpath-provisioner=false
- --cloud-provider=vsphere
- --cloud-config=/etc/kubernetes/vsphere.conf
- --v=5
livenessProbe:
httpGet:
host: 127.0.0.1
path: /healthz
port: 10252
initialDelaySeconds: 30
timeoutSeconds: 10
volumeMounts:
- mountPath: /etc/kubernetes
name: etc-kube
readOnly: true
volumes:
- hostPath:
path: /etc/kubernetes
name: etc-kube
/etc/kubernetes/manifests/proxy.manifest
apiVersion: v1
kind: Pod
metadata:
name: kube-proxy
namespace: kube-system
spec:
hostNetwork: true
containers:
- name: kube-proxy
image: quay.io/coreos/hyperkube:v1.6.4_coreos.0
command:
- /hyperkube
- proxy
- --v=2
- --master=http://<ip address of master>:8080
- --bind-address=<physical ip address of instance>
- --cluster-cidr=10.233.64.0/18
- --proxy-mode=iptables
securityContext:
privileged: true
/etc/kubernetes/manifests/scheduler.manifest
apiVersion: v1
kind: Pod
metadata:
name: kube-scheduler
namespace: kube-system
spec:
hostNetwork: true
containers:
- name: kube-scheduler
image: quay.io/coreos/hyperkube:v1.6.4_coreos.0
command:
- /hyperkube
- scheduler
- --leader-elect=true
- --master=http://<ip address of master>:8080
- --v=2
livenessProbe:
httpGet:
host: 127.0.0.1
path: /healthz
port: 10251
initialDelaySeconds: 30
timeoutSeconds: 10
Dashboard
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
labels:
app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: kubernetes-dashboard
template:
metadata:
labels:
app: kubernetes-dashboard
# Comment the following annotation if Dashboard must not be deployed on master
annotations:
scheduler.alpha.kubernetes.io/tolerations: |
[
{
"key": "dedicated",
"operator": "Equal",
"value": "master",
"effect": "NoSchedule"
}
]
spec:
containers:
- name: kubernetes-dashboard
image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.1
imagePullPolicy: Always
ports:
- containerPort: 9090
protocol: TCP
args:
- --apiserver-host=http://<ip address of master>:8080
---
kind: Service
apiVersion: v1
metadata:
labels:
app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
spec:
type: NodePort
ports:
- port: 80
targetPort: 9090
selector:
app: kubernetes-dashboard
Examples
Storage class
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: fast
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: zeroedthick
Link to existing vmdk
apiVersion: v1
kind: Pod
metadata:
name: pod0001
spec:
containers:
- name: pod0001
image: busybox
command:
- sleep
- "3600"
volumeMounts:
- mountPath: /data
name: pod-volume
volumes:
- name: pod-volume
vsphereVolume:
volumePath: "[<datastore>] kubevols/MyVolume.vmdk"
Dynamic provisioning
The created vmdk gets a name containing the pvc-volume id in Kubernetes. It is placed under kubevols.
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc0002
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: fast
---
apiVersion: v1
kind: Pod
metadata:
name: pvcpod2
spec:
containers:
- name: busybox
image: busybox
command:
- sleep
- "3600"
volumeMounts:
- name: test-volume
mountPath: /test-vmdk
volumes:
- name: test-volume
persistentVolumeClaim:
claimName: pvc0002
Trouble shooting
flannel
The ip address of the docker network must be in the subnet of the flannel network on that same host.
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
link/ether 00:50:56:89:13:3f brd ff:ff:ff:ff:ff:ff
inet xx.xx.xx.xx/24 brd 141.93.123.255 scope global ens192
valid_lft forever preferred_lft forever
5: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
link/ether 02:42:7b:7f:0e:02 brd ff:ff:ff:ff:ff:ff
inet 10.233.126.1/24 scope global docker0
valid_lft forever preferred_lft forever
6: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
link/ether 3e:92:7c:6e:b5:b5 brd ff:ff:ff:ff:ff:ff
inet 10.233.126.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
cleanup
When experimenting it could happen that some settings are not matching anymore. Cleaning the system could then help.
service kubelet stop
docker stop $(docker ps -a -q)
docker rm $(docker ps -a -q)
service docker stop
ip link del docker0
ip link del flannel.1
iptables -F
iptables -X
iptables -t nat -F
iptables -t nat -X
iptables -t mangle -F
iptables -t mangle -X
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
service docker start
service kubelet start
Errors
- Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"pod", UID:"850c43f8-a41b-11e7-b6fa-0050568e6653", APIVersion:"v1", ResourceVersion:"262066", FieldPath:""}): type: 'Warning' reason: 'FailedMount' Failed to attach volume "pvc-84ed0126-a41b-11e7-b6fa-0050568e6653" on node "blah1234" with: vm 'blah1234' not found
Use govc to find the node and note the path to the node. It looks like "vm/linux/fld-rue-local/blah1234". This means that "linux/fld-rue-local" must be added to "working-dir" in the vsphere.conf.