Tag Archives: Kubernetes

Kubernetes Persistent Storage Hell

We've started to work on a rather complex application recently with my team at Red Hat. We all agreed it'll be best to use containers, Kubernetes and Vagrant to make our development (and testing) environment easy to setup (and to be cool, obviously).

Our application consists of multiple components where those important for the post are MongoDB and something we can call worker. The reason for MongoDB is clear - we are working with JSONs and need to store them somewhere. Worker takes data, does some works on them and writes to DB. There are multiple types of workers and they need to share some data. We also need to be able to scale (That's why we use containers!) which also requires shared storage. We want both storages to be local path (for Vagrant use case especially).

Sounds easy, right? But it's not. Here is the config objects situation:

kubernetes/worker-volume.yaml
kubernetes/worker-claim.yaml
kubernetes/mongo-volume.yaml
kubernetes/mongo-claim.yaml

The way you work with volumes i Kubernetes is that you define a PersistentVolume object stating capacity, access mode and host path (still talking about local storage). Then you define PersistentVolumeClaim with access mode and capacity. Kubernetes then automagically map these two - i.e. randomly match claim and volume where volume provides correct mode and enough capacity.

You might be able to see the problem now, but if not, here it is: If you have 2 volumes and 2 claims (as we have) there is no way you can be sure which claim will get which volume. You might not care when you first start your app, because the directories you provided for volumes will be probably empty. But what if you restart the app? Or the Vagrant box (and thus the app)? You cannot be sure which volume will be assigned to which claim.

This leads to an inconsistent state where your MongoDB storage can be assigned to your worker storage and vice versa.

I've found 2 related issues on github https://github.com/.../issues/14908 and https://github.com/.../pull/17056 which, if implemented and solved, should fix it. But is there a workaround?

Hell yeah! And it's pretty simple. Instead of defining PersistentVolumeClaim object and using persistentVolumeClaim key in a replication controller, you can use hostPath directly in the RC. This is how the patch looked like:

diff --git a/kubernetes/mongodb-controller.yaml b/kubernetes/mongodb-controller.yaml
index ffdd5f3..9d7bbe2 100644
--- a/kubernetes/mongodb-controller.yaml
+++ b/kubernetes/mongodb-controller.yaml
@@ -23,5 +23,5 @@ spec:
 mountPath: /data/db
 volumes:
 - name: mongo-persistent-storage
- persistentVolumeClaim:
- claimName: myclaim-1
+ hostPath:
+ path: "/media/mongo-data"
diff --git a/kubernetes/worker-controller.yaml b/kubernetes/worker-controller.yaml
index 51181df..f62df47 100644
--- a/kubernetes/worker-controller.yaml
+++ b/kubernetes/worker-controller.yaml
@@ -44,5 +44,6 @@ spec:
 mountPath: /data
 volumes:
 - name: worker-persistent-storage
- persistentVolumeClaim:
- claimName: myclaim-2
+ hostPath:
+ path: "/media/worker-data"

The important bits of Kubernetes config then looks like:

...
   volumeMounts:
     - name: mongo-persistent-storage
       mountPath: /data/db
 volumes:
   - name: mongo-persistent-storage
     hostPath:
       path: "/media/mongo-data"
...

Mapping service ports to nodes in Kubernetes

Kubernetes is a great project and cool/hot technology. Although it made me to hate JSON (and YAML), I still enjoy exploring the possibilities it brings to your applications deployment.

It's also a base for even more awesome project called OpenShift (*cough* shameless plug included *cough*).

Anyway, I ran into a problem where I needed to expose port(s) of my application to the outer world (i.e. from Vagrant box to my host) and I struggled to find the solution quickly.

Normally, when you are on the machine where Kubes run, you will do something like this

[vagrant@centos7-adb ~]$ kubectl get services | grep flower
flower-service component=flower app=taskQueue,component=flower 10.254.126.210 5555/TCP

IOW I just listed all running services and grepped for flower. I can take IP and port from there now and use curl to get contents provided by that service. This uses the Kubernetes virtual network to get to the endpoint.

I can also do this

[vagrant@centos7-adb ~]$ kubectl get endpoints | grep flower
flower-service 172.17.0.7:5555

which gets me directly to container IP and port.

But this all happens in my Vagrant box (as you can see from the CLI prompt). This setup is good for places like Google Cloud or AWS where you get load balancing and port forwarding for free. But what if I just want to access my app on the VM IP address?

Well, you take your Kubernetes service config/artefact/JSON/YAML and modify it a bit. By default, Kubernetes services are set to "ClusterIP" mode where they are accessible only by the ways showed above. You'll want to change the type to "NodePort".

This will "use a cluster IP, but also expose the service on a port on each node of the cluster (the same port on each node)" according to docs.

apiVersion: v1
 kind: Service
 metadata:
   labels:
     component: flower
     name: flower-service
spec:
  type: NodePort
  ports:
    - port: 5555
      nodePort: 31000
  selector:
    app: taskQueue
    component: flower

By default, type NodePort will give you a random port in a range 30000-32767. You can also pick a specific port from this range (as you can see above). Well, that's it. You only need to know the IP of the machine and the given/specified port.

[vagrant@centos7-adb vagrant]$ kubectl describe service flower-service | grep "^NodePort"
NodePort: <unnamed> 31000/TCP

This is particularly useful when you are developing (with VM, as the use case described above), or if you have some testing instance in the cloud (where the load balancers are not available) and want to expose the app easily without having to fiddle with too many other pieces.