Docker starter kit part 3: Commits, Volumes and Data Management

In the first part of this blog series on Docker, you had the opportunity to follow a Docker basic tutorial and to have an overview of the lifecycle of a Docker container. In the second part you then studied what a Docker Image and a Docker Container are made up. This third post of the series will dive into how you can manage your data with Docker.

UnionFS and persistence

Thin Layer Lifecycle

If you remember correctly what we saw until now, an image is the union of multiple read only layers. When we create a container, Dockers adds a new read/write thin layer. Everything that happens during the life of the container occurs in this thin layer. When the container is destroyed this thin layer is destroyed with it and everything that happened during the life of the container is lost.

Docker commit

Let me reassure you right now, there is a way to avoid losing everything: the docker commit command. Basically what this command does is commit the thin layer into a new image, a bit like you would commit your staged files into your git repository. To illustrate this, we will use the debian:jessie image we already encountered. We will run it starting an interactive bash session:

docker run --name demo -it debian:jessie bash

We will get back to this on a later post, for the moment, let’s just say to simplify that we told Docker to attach to the container so that we can run commands interactively. You should now have something like this:

root@<container_id>:/#

You are ready to interact directly with your container. Let’s create a simple dummy file to simulate data:

cd home
touch hello

Now, if you remember correctly, we just have created a file named hello in the thin layer of the container. First let’s check that if we stop and restart the container the file is still here. We will exit the container:

exit

The container should be stopped (you can check with docker ps -a), let’s restart it and attach to it:

docker start demo
docker exec -it demo bash

We are now attached to the container, let’s make sure our file is still here:

ls /home/

You should see hello. Now, let’s exit the container again (exit) and instanciate a new container from debian:jessie:

docker run --name demo2 -it debian:jessie bash

Let’s check the content of the home directory of this container:

ls /home/

It’s empty as we expected. This means that if we destroy the demo container, we lose our hello file. Fortunately, the docker commit command is here to help us. Let’s exit the demo2 container again. Then we can commit our demo container:

docker commit demo dummyimage

That’s it, we have an image ready and we can create new containers which will contain our hello file. I’ll let you make sure any container created from our dummyimage is containing the hello file. You can also check the history of the image to see that we indeed commited a new layer:

IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
387fe6f1f01a        15 hours ago        bash                                            71 B
73e72bf822ca        4 weeks ago         /bin/sh -c #(nop)  CMD ["/bin/bash"]            0 B
<missing>           4 weeks ago         /bin/sh -c #(nop) ADD file:41ea5187c501168...   123 MB

The `docker commit` command is useful to commit changes you do not want to lose.
You can also leverage on it to move a custom container to another machine for instance.
However, for maintenability purpose, it is recommended to build your images using a Dockerfile as you saw in the first part of this series.
We will get back to the building of image using Dockerfile in more details later on.

Volumes

At this point, you might starting to understand that the UnionFS is not really what we wish for data persistence. With it, data are bound to a container lifecycle. You can quickly lose some data by just destroying a container. Data sharing between containers is not easily possible either. Now, volume comes into play.

Definition

A volume is a directory that bypasses the UnionFS. A volume decouples the data from the lifecycle of the container.

Among its properties are:

A volume is initialized at container creation.
A volume can be shared and reused between different containers.
If you delete a container, its volumes are not deleted (except if you explicitely ask for their deletion).

Data Volumes

The following assumes you are using at least Docker v1.9.
This is the first release that is using named volumes.

Named volumes

Creation and Mounting

There are two ways of creating named volumes:

either explicitely with the docker volume create command.
or implicitely when creating a new container.

So both of the following will do the same:

## explicit method
docker volume create --name my_data
docker run -it --name my_container -v my_data:/data debian:jessie bash
## implicit method
docker run -it --name my_container -v my_data:/data debian:jessie bash

I’ll let you run one of the two methods (not both!). You should be attached to a container. We will then write a file in our volume:

touch /data/hello
exit

Then, let’s mount a new container. This time, the my_data volume will be mounted under /home:

docker run -it --name my_container2 -v my_data:/home debian:jessie bash

In the container, let’s check that our volume is correctly mounted under /home:

ls /home
exit

Mounting volumes used by other containers

You can mount volumes used by other containers. For instance, the following would mount a /data and a /home directory, both pointing to the my_data volume:

docker run -it --name my_container3 --volumes-from my_container --volumes-from my_container2 debian:jessie bash

In the container, we can check that both /data and /home contain the hello file:

ls /home
ls /data
exit

Be careful with what you do when manipulating volumes.
It may be not a good idea to mount the same volume on two different mount points in the same container!
Same thing with concurrent modifications: if you are mounting the same volume in different containers, you most probably want to check for concurrent modifications.

Mounting on an existing directory

If a container contains data in the mount point, then this data is copied onto the new volume.

Find the volume(s) used by a container

You can use the docker inspect --format "{{ .Mounts }}" command to locate the volume(s) used by a container. Running this command on my_container3, we have:

[{ my_data /var/lib/docker/volumes/my_data/_data /data local  true }
 { my_data /var/lib/docker/volumes/my_data/_data /home local  true }
]

Removing a volume

One thing important to know, is that removing a container won’t remove its volumes (even if they are not used by any other container). This is coherent with the fact that volumes should lived independently from containers. So if we do the following:

docker rm my_container
docker rm my_container2
docker rm my_container3
docker volume ls | grep my_data

We see that we still have a our my_data volume. To finally delete the volume:

docker volume rm my_data

Read only mode

Until now, we did not specify any option when mounting our volume. It was mounted as read/write. We could chose to mount it in read only mode:

docker run -it --name my_container -v my_data:/data:ro debian:jessie bash

In the container:

root@325a440f674b:/# touch /data/hello
touch: cannot touch '/data/hello': Read-only file system

Host directory mounting

Declare the mount

Docker also allows to mount a directory of your host. You can mount a host directory by specifying a full path. Create a file called hello, containing the string hello world on your host. Then issue:

docker run -it --name host_mounting -v $PWD:/data debian:jessie bash

Carreful, if you are running under Windows or OsX, you need to let Docker access the shared directory by going into Docker options under the Shared Drives category.
Under Windows, you'll also need to specify the path manually as the PWD variable does not exist.

Once you booted up your container, we will install vim and then open the hello file:

apt-get update && apt-get install vim -y
cd /data
vim hello

You should see the hello world string. Now edit the file and add something to it. For those not familiar with vim, you want to type i to enter edition mode, then type something, then type esc and finally :wq. Now, open the file on a text editor in your host to see the changes.

Read only mode

It works like volumes, check the previous section.

Find the volume(s) used by a container

It works like volumes, check the previous section.

Mounting on an existing directory

If a container contains data in the mount point, then this data is kept, and the host directory content overlays the data in the container mount point.

Mounting a file

Host mount also works with files. The previous example could have been writen:

docker run -it --name host_mounting -v $PWD/hello:/data/hello debian:jessie bash

About Data Containers

I made a choice to present you only named volumes and to avoid data containers. Here is why: when you use data container, Docker really creates unnamed containers behind your back. If you destroy your data container, the volume it was using become dangling and its management is not easy.

Here is an example that will show you a bit the usage of data containers and the limitation I was mentionning.

We will first run interactively a container with an unnamed volume:

docker run -it --name unnamed_volume -v /data debian:jessie bash

And in the container:

touch /data/hello
exit

Let’s inspect the container for mounted volumes:

docker inspect --format "{{ .Mounts }}" unnamed_volume

You end up with:

[{volume 6faa8a43f80bb550b11aa8f3f6093273782fbf5a5674fd5067ba7c7c27942314 /var/lib/docker/volumes/6faa8a43f80bb550b11aa8f3f6093273782fbf5a5674fd5067ba7c7c27942314/_data /data local  true }]

So you really have a volume stored. Now, usually, what you would do to be able to reuse this data container from another container is to use the --volumes-from command:

docker run -it --name unnamed_volume2 --volumes-from unnamed_volume debian:jessie bash

Here unnamed_volume2 shares the same unnamed volume with unnamed_volume. This volume is mounted under /data. Let’s check this:

ls /data
exit

You should see our dummy hello file. So far so good. Why do I prefer avoiding this technic might you ask. Well, let’s now delete both our containers after having taken care of remembering our unnamed volume id:

docker inspect --format "{{ .Mounts }}" unnamed_volume
docker rm unnamed_volume
docker rm unnamed_volume2

And let’s look for our unnamed volume:

docker volume ls | grep <volume_id>

It’s still there! Of course, you could still mount it by its id, to check what it contains, but it’s far from perfect from a maintenance point of view.

You could also have issued a `docker rm -v` command to delete any anonymous volumes.

I’m not saying that unnamed volumes and data containers might not be useful sometimes, but most of the case you will probably end up using named containers. So, unless you are stuck with a pre 1.9 version of Docker, I strongly suggest you consider named volumes first. In any case, you might want to review the Manage data in containers page of the official documentation.

That’s it for this post, next one will show you how you could back up and restore your data, how you can extend volumes with drivers, and a practical usage of the mounting of a host directory.