Tag Archives: PersitentVolume

Kubernetes Persistent Storage Hell

We've started to work on a rather complex application recently with my team at Red Hat. We all agreed it'll be best to use containers, Kubernetes and Vagrant to make our development (and testing) environment easy to setup (and to be cool, obviously).

Our application consists of multiple components where those important for the post are MongoDB and something we can call worker. The reason for MongoDB is clear - we are working with JSONs and need to store them somewhere. Worker takes data, does some works on them and writes to DB. There are multiple types of workers and they need to share some data. We also need to be able to scale (That's why we use containers!) which also requires shared storage. We want both storages to be local path (for Vagrant use case especially).

Sounds easy, right? But it's not. Here is the config objects situation:


The way you work with volumes i Kubernetes is that you define a PersistentVolume object stating capacity, access mode and host path (still talking about local storage). Then you define PersistentVolumeClaim with access mode and capacity. Kubernetes then automagically map these two - i.e. randomly match claim and volume where volume provides correct mode and enough capacity.

You might be able to see the problem now, but if not, here it is: If you have 2 volumes and 2 claims (as we have) there is no way you can be sure which claim will get which volume. You might not care when you first start your app, because the directories you provided for volumes will be probably empty. But what if you restart the app? Or the Vagrant box (and thus the app)? You cannot be sure which volume will be assigned to which claim.

This leads to an inconsistent state where your MongoDB storage can be assigned to your worker storage and vice versa.

I've found 2 related issues on github https://github.com/.../issues/14908 and https://github.com/.../pull/17056 which, if implemented and solved, should fix it. But is there a workaround?

Hell yeah! And it's pretty simple. Instead of defining PersistentVolumeClaim object and using persistentVolumeClaim key in a replication controller, you can use hostPath directly in the RC. This is how the patch looked like:

diff --git a/kubernetes/mongodb-controller.yaml b/kubernetes/mongodb-controller.yaml
index ffdd5f3..9d7bbe2 100644
--- a/kubernetes/mongodb-controller.yaml
+++ b/kubernetes/mongodb-controller.yaml
@@ -23,5 +23,5 @@ spec:
 mountPath: /data/db
 - name: mongo-persistent-storage
- persistentVolumeClaim:
- claimName: myclaim-1
+ hostPath:
+ path: "/media/mongo-data"
diff --git a/kubernetes/worker-controller.yaml b/kubernetes/worker-controller.yaml
index 51181df..f62df47 100644
--- a/kubernetes/worker-controller.yaml
+++ b/kubernetes/worker-controller.yaml
@@ -44,5 +44,6 @@ spec:
 mountPath: /data
 - name: worker-persistent-storage
- persistentVolumeClaim:
- claimName: myclaim-2
+ hostPath:
+ path: "/media/worker-data"

The important bits of Kubernetes config then looks like:

     - name: mongo-persistent-storage
       mountPath: /data/db
   - name: mongo-persistent-storage
       path: "/media/mongo-data"