Kubernetes Persistent Storage Hell

We've started to work on a rather complex application recently with my team at Red Hat. We all agreed it'll be best to use containers, Kubernetes and Vagrant to make our development (and testing) environment easy to setup (and to be cool, obviously).

Our application consists of multiple components where those important for the post are MongoDB and something we can call worker. The reason for MongoDB is clear - we are working with JSONs and need to store them somewhere. Worker takes data, does some works on them and writes to DB. There are multiple types of workers and they need to share some data. We also need to be able to scale (That's why we use containers!) which also requires shared storage. We want both storages to be local path (for Vagrant use case especially).

Sounds easy, right? But it's not. Here is the config objects situation:

kubernetes/worker-volume.yaml
kubernetes/worker-claim.yaml
kubernetes/mongo-volume.yaml
kubernetes/mongo-claim.yaml

The way you work with volumes i Kubernetes is that you define a PersistentVolume object stating capacity, access mode and host path (still talking about local storage). Then you define PersistentVolumeClaim with access mode and capacity. Kubernetes then automagically map these two - i.e. randomly match claim and volume where volume provides correct mode and enough capacity.

You might be able to see the problem now, but if not, here it is: If you have 2 volumes and 2 claims (as we have) there is no way you can be sure which claim will get which volume. You might not care when you first start your app, because the directories you provided for volumes will be probably empty. But what if you restart the app? Or the Vagrant box (and thus the app)? You cannot be sure which volume will be assigned to which claim.

This leads to an inconsistent state where your MongoDB storage can be assigned to your worker storage and vice versa.

I've found 2 related issues on github https://github.com/.../issues/14908 and https://github.com/.../pull/17056 which, if implemented and solved, should fix it. But is there a workaround?

Hell yeah! And it's pretty simple. Instead of defining PersistentVolumeClaim object and using persistentVolumeClaim key in a replication controller, you can use hostPath directly in the RC. This is how the patch looked like:

diff --git a/kubernetes/mongodb-controller.yaml b/kubernetes/mongodb-controller.yaml
index ffdd5f3..9d7bbe2 100644
--- a/kubernetes/mongodb-controller.yaml
+++ b/kubernetes/mongodb-controller.yaml
@@ -23,5 +23,5 @@ spec:
 mountPath: /data/db
 volumes:
 - name: mongo-persistent-storage
- persistentVolumeClaim:
- claimName: myclaim-1
+ hostPath:
+ path: "/media/mongo-data"
diff --git a/kubernetes/worker-controller.yaml b/kubernetes/worker-controller.yaml
index 51181df..f62df47 100644
--- a/kubernetes/worker-controller.yaml
+++ b/kubernetes/worker-controller.yaml
@@ -44,5 +44,6 @@ spec:
 mountPath: /data
 volumes:
 - name: worker-persistent-storage
- persistentVolumeClaim:
- claimName: myclaim-2
+ hostPath:
+ path: "/media/worker-data"

The important bits of Kubernetes config then looks like:

...
   volumeMounts:
     - name: mongo-persistent-storage
       mountPath: /data/db
 volumes:
   - name: mongo-persistent-storage
     hostPath:
       path: "/media/mongo-data"
...

3 thoughts on “Kubernetes Persistent Storage Hell

  1. gangefors

    Wouldn’t there be an issue when your pods end up on different nodes in your cluster?
    A hostPath is a folder on your node where the pod lives so you need to have the same mount shared on all your nodes in the cluster to make this work. Because when a node goes down or a pod fails you don’t know on which node the pod will be started.

    Reply
    1. vpavlin Post author

      Yup, that’s true. I only cared about a single host instance here. But all the same applies to multi-host. Just replace hostPath with NFS for example and you use volumes and claims. You end up in the same mess as on single host with hostPath.

      Reply
  2. Humble

    We use GlusterFS for persistent data store. You have it in form of Gluster Volume plugin in Kubernetes. GlusterFS pods can also run in the k8s cluster and serve a volume. Thought of mentioning it if it helps.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *