Tag Archives: Kubernetes

Running a WordPress blog on OpenShift Origin

I used to run my blog(s) on OpenShift Online (v2) which was, for many reasons, a terrible use case for that platform and my OpenShift colleagues hated that people do that a lot. This is one of the reason why OpenShift Online (v3) enforces some sleep time for the running services - to prevent you from running blogs and generally "production" services on the free service. You can still do it, if you don't mind some downtime.

For blog though, downtime is generally a bad thing, so I started to look around to find a different place to put it. Obviously, I did not want to pay and I especially did not want to pay "per instance" as I am also running a blog for my mum. My though was to use some cheap webhosting, but that would mean I would have to setup thing the old way - mode 1, without containers and all the orchestration goodness.

I started to look for a cheap cloud provided where I could run a small server for couple dollars a month - I first went to DigitalOcean as I have some free credits there, then I found Linode and Alibaba which were cheaper for the same or a bit more powerful VMs. The plan was to deploy OpenShift Origin, use Grant's tutorial to migrate the WordPress and never care about it again.

I created a VM, started to play around with OpenShift there and then I thought: "Maybe I could use an old laptop and just run OpenShift in my closet". The laptop has 4 GB of RAM and 2 cores of CPU, so it's 2-4 times more powerful than the VM in the cloud. It does not have SSD, so the disk operations are not the best, but it works and for $5-10 dollars I save on the cloud VM, I can buy the SSD later if I need.

So I updated the old netbook to latest Fedora, downloaded OpenShift Origin client binary from Github and started the single-node cluster. By following the above mentioned migration tutorial I got my WordPress instance running in few minutes and fixing the A record for DNS gave me my nice URL as well. So far, so good.

I did not have much time when I set it all up late last year, so I just migrated my mum's blog as well and let it run. It worked like a charm and I thought that as soon as I get a bit of time, I'll simply fix some things that were not setup properly - like backups or what happens when the laptop restarts.

As life happens, I never got back to figure out the backup and restart stories, which resulted in a major cluster fu...fail. We are redoing electrical wiring in our building, which means there are some expected outages. The laptop generally survived them fine, until one outage took almost a whole day. The laptop turned off and with it both blogs.

I thought it'll be fine - I'll just boot it up and run the OpenShift cluster again. Easy, right? Basically, yes, it is, if you read the docs carefully and keep your etcd data out of the Origin container, because if you don't, once the container dies, the etcd - i.e. the database behind OpenShift and Kubernetes, is gone. Ok, well, not a great thing to happen, but I still had all my other data from blogs, so I just needed to run the containers again with the same host volumes.

Sadly, that turned out to be problematic as well, because MySQL container does not really like to be pointed to the old volume with a new deployment (I did not investigate why, although it might be interesting for MySQL OpenShift deployment developers..). So the database failed. I copied the database volume to a new directory and started to experiment on it - the idea was that if I can dump it, I could just run a clean deployment and then import the dump.

This approach seemed to be getting me closer to the goal until the point where I deleted the original data volume by mistake, which left me only with the already corrupted copy. Yay. So close! At that point, there was not much else to do than to start over and run from the 6 months old backups I made during the migration. Not great, but lessons learnt.

Well, let's start from scratch and do things right now! First, run OpenShift Origin with a proper set of parameters. Those will include things like "run from existing configuration", "store etcd on disk", "select specific path for volumes"...

/bin/oc cluster up --public-hostname=$INTERNAL_IP.nip.io --routing-suffix=$EXTERNAL_IP.nip.io --use-existing-config --host-data-dir=$HOST_DIR --host-config-dir=$HOST_DIR/openshift.local.config --host-pv-dir=$HOST_DIR/openshift.local.pv --host-volumes-dir=$HOST_DIR/openshift.local.volumes

Next, make sure the cluster will get started on (re)boot, so that it comes back up if there is another electricity issue. For that I used a unit from Tobias Brunner's post. Now, let's reboot and...it works!.

Next thing is backup. It does not make much sense to store backup on the same drive and I did not really want to upload everything somewhere to the cloud, so I just plugged in a USB Flash drive I got at some conference and created a systemd mount unit based on James Oguya's post. For the actual backup, I decided to simply rsync the directory, where I store all the OpenShift configuration, etcd data, PVs, etc. to the flash drive. I am not entirely sure it is the best solution or even a solution which will allow me to restore things easily and fully, but I hope it will..and I will test it at some point;). For the rsync backup I created a systemd timer and a one shot unit which the timer runs once a day. ArchWiki post helped with this step.

The last thing is to not only start, but also enable all those services, so that things get into the right state after reboot. I tried to reboot couple times and things seemed to work fine after that - it takes couple minutes to boot, start the cluster and deploy all the containers, so there is a downtime during reboot, but I can live with that.

Wish me luck so that I don't lose the data again and share your stories about how and what you run at home:).

Kubernetes Persistent Storage Hell

We've started to work on a rather complex application recently with my team at Red Hat. We all agreed it'll be best to use containers, Kubernetes and Vagrant to make our development (and testing) environment easy to setup (and to be cool, obviously).

Our application consists of multiple components where those important for the post are MongoDB and something we can call worker. The reason for MongoDB is clear - we are working with JSONs and need to store them somewhere. Worker takes data, does some works on them and writes to DB. There are multiple types of workers and they need to share some data. We also need to be able to scale (That's why we use containers!) which also requires shared storage. We want both storages to be local path (for Vagrant use case especially).

Sounds easy, right? But it's not. Here is the config objects situation:

kubernetes/worker-volume.yaml
kubernetes/worker-claim.yaml
kubernetes/mongo-volume.yaml
kubernetes/mongo-claim.yaml

The way you work with volumes i Kubernetes is that you define a PersistentVolume object stating capacity, access mode and host path (still talking about local storage). Then you define PersistentVolumeClaim with access mode and capacity. Kubernetes then automagically map these two - i.e. randomly match claim and volume where volume provides correct mode and enough capacity.

You might be able to see the problem now, but if not, here it is: If you have 2 volumes and 2 claims (as we have) there is no way you can be sure which claim will get which volume. You might not care when you first start your app, because the directories you provided for volumes will be probably empty. But what if you restart the app? Or the Vagrant box (and thus the app)? You cannot be sure which volume will be assigned to which claim.

This leads to an inconsistent state where your MongoDB storage can be assigned to your worker storage and vice versa.

I've found 2 related issues on github https://github.com/.../issues/14908 and https://github.com/.../pull/17056 which, if implemented and solved, should fix it. But is there a workaround?

Hell yeah! And it's pretty simple. Instead of defining PersistentVolumeClaim object and using persistentVolumeClaim key in a replication controller, you can use hostPath directly in the RC. This is how the patch looked like:

diff --git a/kubernetes/mongodb-controller.yaml b/kubernetes/mongodb-controller.yaml
index ffdd5f3..9d7bbe2 100644
--- a/kubernetes/mongodb-controller.yaml
+++ b/kubernetes/mongodb-controller.yaml
@@ -23,5 +23,5 @@ spec:
 mountPath: /data/db
 volumes:
 - name: mongo-persistent-storage
- persistentVolumeClaim:
- claimName: myclaim-1
+ hostPath:
+ path: "/media/mongo-data"
diff --git a/kubernetes/worker-controller.yaml b/kubernetes/worker-controller.yaml
index 51181df..f62df47 100644
--- a/kubernetes/worker-controller.yaml
+++ b/kubernetes/worker-controller.yaml
@@ -44,5 +44,6 @@ spec:
 mountPath: /data
 volumes:
 - name: worker-persistent-storage
- persistentVolumeClaim:
- claimName: myclaim-2
+ hostPath:
+ path: "/media/worker-data"

The important bits of Kubernetes config then looks like:

...
   volumeMounts:
     - name: mongo-persistent-storage
       mountPath: /data/db
 volumes:
   - name: mongo-persistent-storage
     hostPath:
       path: "/media/mongo-data"
...

Mapping service ports to nodes in Kubernetes

Kubernetes is a great project and cool/hot technology. Although it made me to hate JSON (and YAML), I still enjoy exploring the possibilities it brings to your applications deployment.

It's also a base for even more awesome project called OpenShift (*cough* shameless plug included *cough*).

Anyway, I ran into a problem where I needed to expose port(s) of my application to the outer world (i.e. from Vagrant box to my host) and I struggled to find the solution quickly.

Normally, when you are on the machine where Kubes run, you will do something like this

[vagrant@centos7-adb ~]$ kubectl get services | grep flower
flower-service component=flower app=taskQueue,component=flower 10.254.126.210 5555/TCP

IOW I just listed all running services and grepped for flower. I can take IP and port from there now and use curl to get contents provided by that service. This uses the Kubernetes virtual network to get to the endpoint.

I can also do this

[vagrant@centos7-adb ~]$ kubectl get endpoints | grep flower
flower-service 172.17.0.7:5555

which gets me directly to container IP and port.

But this all happens in my Vagrant box (as you can see from the CLI prompt). This setup is good for places like Google Cloud or AWS where you get load balancing and port forwarding for free. But what if I just want to access my app on the VM IP address?

Well, you take your Kubernetes service config/artefact/JSON/YAML and modify it a bit. By default, Kubernetes services are set to "ClusterIP" mode where they are accessible only by the ways showed above. You'll want to change the type to "NodePort".

This will "use a cluster IP, but also expose the service on a port on each node of the cluster (the same port on each node)" according to docs.

apiVersion: v1
 kind: Service
 metadata:
   labels:
     component: flower
     name: flower-service
spec:
  type: NodePort
  ports:
    - port: 5555
      nodePort: 31000
  selector:
    app: taskQueue
    component: flower

By default, type NodePort will give you a random port in a range 30000-32767. You can also pick a specific port from this range (as you can see above). Well, that's it. You only need to know the IP of the machine and the given/specified port.

[vagrant@centos7-adb vagrant]$ kubectl describe service flower-service | grep "^NodePort"
NodePort: <unnamed> 31000/TCP

This is particularly useful when you are developing (with VM, as the use case described above), or if you have some testing instance in the cloud (where the load balancers are not available) and want to expose the app easily without having to fiddle with too many other pieces.