Share Host’s Mount Namespace with Docker Containers

This is a follow-up to my previous post about using Super Privileged Container (SPC) to mount a remote filesystem on the host. That approach drew criticisms on hacking into mount helpers.

I made a Docker patch so that Docker daemon doesn’t isolate the host’s mount namespace from containers. Containers are thus able to see and update host’s mount namespace. This feature is turned on through a Docker client option –hostns=true.

A running instance is as the following:

First start a container and set –hostns=true:

[code language=”bash”]
#docker run –privileged –net=host –hostns=true -v /:/host -i -t centos bash
[/code]

On another terminal, wait after the container is up and you get a bash shell, see the container’s mount namespace:

[code language=”bash”]
# pid=`ps -ef |grep docker |grep -v run|grep -v grep|awk ‘{print $2}’`; b=`ps -ef |grep bash|grep ${pid}|awk ‘{print $2}’`; cat /proc/${b}/mountinfo
[/code]

And below, I spotted the following line, indicating the container and host share the same mount namespace.

[code language=”text”]
313 261 253:1 / /host rw,relatime shared:1 – ext4 /dev/mapper/fedora–server_host-root rw,data=ordered
[/code]

Then on the container’s shell, install glusterfs-fuse package and mount a remote Glusterfs volume:

[code language=”bash”]
# yum install glusterfs-fuse attr -y
# mount -t glusterfs gluster_server:kube_vol /host/shared
[/code]

Go back to the host terminal and check if the host can see Glusterfs volume:

[code language=”bash”]
# findmnt |grep glusterfs |tail -1
└─/shared gluster_server:kube_vol fuse.glusterfs rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072
[/code]

So far so good!

Manage Ceph RBD Device without rbd

There are lots of examples of using rbd(8) command to manage a RBD device, while there is less publicity that we can do the same by dealing with sysfs.

Some instructions can be found here. More detailed explanation of these parameters come from rbd kernel documentation. Our hero Sebastian bravely showed his usage. I also ventured to validate it on my Fedora 21 using my local Ceph container and a pool called kube, which contains an image called foo:

# echo "127.0.0.1 name=admin,secret=AQCw/W1VCOQFCRAAbRxkhg3TuCXRS42ols3hqQ== kube foo" > /sys/bus/rbd/add
# ls /dev/rbd/kube/foo -l
lrwxrwxrwx 1 root root 10 Jun 5 13:31 /dev/rbd/kube/foo -> ../../rbd2

A Tale of Two Virtualizations

In my previous post on Intel’s Clear Linux project, I had a few questions on how Intel got KVM move fast to match containers. Basically Clear Linux aims to bring hypervisors the first class citizen in container world.

Today I looked into another, yet similar technology called hyper. hyper establishes itself as hypervisor agnostic, high performing, and secure alternative to Docker and KVM. Love or hate it, hyper is able to run bother hypervisor and container in its own environment.

The architecture, as I peeked from the source, shares with Docker. A cli client interacts with a hyper daemon through REST. Daemon, by invoking QEMU and Docker engines, creates/destroys/deletes either VM or container. hyper understands Docker images (it uses Docker daemon API), QEMU (directly exec QEMU commands with well tuned configuration), Pod (appears similar with Kubernetes POD, except for the QEMU provisions).

hyper comes with hyperstart, a replacement to init(1), aiming for fast startup. To use hyperstart, you have to bake a initrd.

With these two similar initiatives of converging hypervisors and containers, I am now daydreaming of the near future when we don’t have to make trade-offs between VM and container in the single framework (KVM or Docker).