[DOCKER-SWARM] Introduction & Workshop : build a scalable application environment

In a previous article, we introduced Docker and the core principles of its architecture and functioning. It helped us understand how to build and deploy an application using containers on a single-host infrastructure. Docker provides a solution for grouping Docker hosts into a cluster called a Swarm. It is a simple and easy alternative to the popular Kubernetes tool.

In the first part of this article, we will be covering the core principles and mechanisms of Docker Swarm. Finally, we will be doing a step by step workshop illustrating how to set up a Swarm with image and volume sharing to run a basic application. Do not hesitate to jump right into the Workshop part!

So, Docker Swarm is a clustering and scheduling tool for Docker containers. It abstracts hosts resources for containers and let Docker orchestrate resources like choosing on which host to run containers to balance the load and many other things.

Figure 1: Difference when running Docker in Swarm mode

Swarm architecture

A swarm cluster is composed of nodes – hosts running Docker engine in Swarm Mode – that can either be manager or worker. The manager node will handle the cluster management like maintaining the cluster state, schedule services, … Every Swarm needs at least one manager node. On the other hand, the worker node‘s purpose is to run containers. A Worker node does not participate in Swarm management. By default, a manager node is also a worker but can be configured to run exclusively as a manager. Docker Swarm will elect one node as the Swarm’s Leader. Its role is to log every action that is done in the cluster. The leader is also making decisions to allow state modifications. Every manager can schedule services, but every manager must refer to the leader to validate a state-altering operation. Every state change request goes through the Leader node.

Figure 2: Docker Swarm architecture scheme

Having a shared, consistent state, ensure that any manager node can take the lead if the previous leader fails.

Deploy applications: Docker Stack

A stack is a collection of services that make up an application. A YAML Stack file defines the application, quite similar to the docker-compose file. Thanks to Stacks, we can deploy, scale and orchestrate services altogether as a single application bundle. There is no more in Stack files that there is in Docker compose files. So if you are comfortable with docker-compose you shouldn’t have any concerns for Docker Stacks.

You can deploy your stack using:

$: docker stack deploy --compose-file docker-compose.yml [stack-name]

Check services from a specific stack:

$: docker stack services [stack-name]
u3avs16m32fe demoapp_php replicated 1/1
zhrywqivgz4f demoapp_web replicated 1/1

Take down a deployed stack:

$: docker stack rm [stack-name]

Service scaling: replicas

Docker engine provides a convenient way to solve service redundancy, using replicas. When deploying a stack, you can specify the number of replicas to maintain. Docker will create multiple instances of your service that will run simultaneously in your Swarm. Every incoming request will be handled by one replica of your service. If a replica fails, Docker will pop a new one to maintain the number of replicas requested by the user. This great feature ensures service continuity even in case of containers failure!

Figure 3: Replica’s recovery mechanism

You can define the number of replicas for a service at any moment with the command:

docker service scale [service-name]=[number-of-replicas]

This simple instruction allows you to scale up and down your application services. For instance, let’s try to scale a php service from a demoapp application:

$: docker service scale demoapp_php=3
demoapp_php scaled to 3
overall progress: 3 out of 3 tasks
1/3: running [==================================================>]
2/3: running [==================================================>]
3/3: running [==================================================>]
verify: Service converged

Docker tells us that the scaling is successful, and we can see that the number of replicas grew up to three:

$: docker service list
u3avs16m32fe demoapp_php replicated 3/3

We will go through the whole process in the second part of the article, so do not worry if it seems a bit out of context. This is just a simple example to demonstrate how Docker Swarm works.

Internal load balancing

We now have a better idea of what Docker Swarm is. Manager nodes can schedule tasks on worker nodes, scale up and down services and maintain the Swarm state. It occurred to us that Docker uses load-balancing to make sure every node is handling roughly the same workload.

The Docker Swarm load balancer runs on every node, workers included. It can load balance any request to any other containers of any host running the same task in the cluster. Indeed, thanks to the internal DNS feature, Docker engine can address a service using its name, so every node does not have to know every other running replicas’ IP address.

In the previous section, we created a few replicas from the same service. Docker engine will try to pop these containers on as many workers as possible to enhance the load balancing system.

Routing mesh

A nice feature of Docker Swarm is the routing mesh. It enables each node in the Swarm to accept connections on published ports for any service in the Swarm even if the current node doesn’t run any related task. The mesh then routes the request to a node running the requested service.

Figure 4: Routing mesh scheme – both hosts can receive a request but only host A can process it

A service exposed on a single node can now be reachable through every node of the swarm!

Swarm network mode

As mentioned in a previous article, running Docker in Swarm mode will result in using an overlay network to connect all nodes. In fact, when creating a Swarm, the engine creates two networks:

The ingress network: this is the overlay network that Docker Swarm uses to expose services to the external network, and connect to other Swarm nodes.

The docker_gwbridge network: this one runs in bridge mode, it allows containers to connect to the host that it is running on. It has a similar behavior as Docker0 for a single node.

NETWORK ID          NAME               DRIVER              SCOPE 
ywo5s9pwljct ingress overlay swarm
2a1224edbf38 docker_gwbridge bridge local

On top of that, every application deployed on the Swarm will have its own network configuration. This is the default behavior.

Host firewall rules

In order to run Docker in Swarm mode and connect multiple hosts, the following ports must be open to traffic on each node:

  • 2377/tcp: for cluster management communications
  • 7946/tcp & 7946/udp: for communication among nodes and container network discovery
  • 4789/udp: for overlay network traffic – ingress network

Image sharing in Swarm

Images are stored locally on hosts. This is not an issue with a single node as images are not fetched over the network. In a Swarm, you’ll be likely to spawn containers on a different host than the one used to build your image. Thus, images should be accessible over the network to be used by other nodes, which is the case when using default images from DockerHub, but no longer possible when you build your images using a dockerfiles.

Figure 5: Highlighting the fact that images are stored locally on a host

Why do I still need images once I have running services? As mentioned before, a task can fail. Therefore, Docker will have to run another container to replace the compromised one. It will use the docker-compose file to load the corresponding images, and this operation will fail if they are not within reach.

To share images across Swarm, you can use a remote Registry. Either from your DockerHub account or a registry service running in the Swarm.

Figure 6: Fetching images from a registry, image is accessible to every container

Volume Storage in a Swarm

Natively, Docker only supports “local” driver for volume storage. Meaning that working in a Swarm, Docker will create new volumes every time your task is running on a new node. Thus, you’ll end up with a different version of your volume, existing at the same time in your swarm. If a container is made to mount a folder from host filesystem, it will mount a different folder on every host it is running on. So the same logical volume will become two distinct volumes.

Figure 7: Here, both containers are mounting the same logical volume on distinct hosts

In order to share consistent, unique volume across the swarm we’ll need to have a shared file system accessible by every host. That way you ensure volume consistency all across the Swarm.

Figure 8: Here, the mounted volume is unique and accessible through a shared file system

This way you do share a single volume across multiple containers running on multiple hosts.

Maintain the cluster state, Docker’s Raft consensus

First things first, a consensus is :

A fundamental problem in distributed computing and multi-agent systems is to achieve overall system reliability in the presence of a number of faulty processes. This often requires coordinating processes to reach consensus, or agree on some data value that is needed during computation. Example applications of consensus include agreeing on what transactions to commit to a database in which order, state machine replication, and atomic broadcasts […]

From Wikipedia

The Raft Consensus is an algorithm used by Docker Swarm to make its manager nodes agree on the shared cluster state. The Raft Consensus is not a specificity of Docker but can be found in many other systems that require consensus over multiple nodes. The algorithm ensures the following properties in the Swarm :

  • There is at most one leader in the Swarm, and a leader needs to be elected if current leader fails
  • The manager nodes fault tolerance in a Swarm is (N-1)/2
  • The number of vote required to validate a change to the cluster state is (N/2)+1, it referred as the quorum

Leader election : the leader sends periodical requests – heartbeat – to the other nodes to let them inform that it is still leading. If a follower – as opposed to leader – does not receive a heartbeat message from the leader, it becomes a candidate and tries to be elected by getting the majority of votes from other nodes.

Only manager nodes can be part of the Raft Consensus. Although there is no limit to the number of manager nodes, keep in mind that every manager node takes part in the quorum to validate cluster state changes, thus reducing write performances. It is recommended to have an odd number of manager nodes because the next even number does not make the cluster more resilient.

WORKSHOP: Build a scalable environment with Docker Swarm

Now that we have some background knowledge about Swarm clusters, let’s try to build a simple one! Our objective will be to produce a scalable environment. As we have seen previously in the article we will need to address two things : image and volume sharing. We want to be sure that every container has the exact same inputs as the others.

About image sharing : we will use images either from Docker official registry or from my personal Docker Hub registry. That way we ensure that every container will be accessing remote registries and not local ones. It should be noted that – even if not used for our workshop – every Dockerfile compiled into an image should be pushed to a single repository that can be either remote or directly attached to the Swarm as a registry service.

About volume sharing : We will setup a very simple Shared File System using GlusterFS to provide a shared volume for all our containers. Thus, we will ensure configuration-consistency all across the cluster. Don’t worry, you don’t need any prior knowledge about GlusterFS.

For the setup, we have three freshly installed Debian 10 VMs at our disposal, and this is what we will be doing with these resources:

Fig 9 : Workshop target, Swarm cluster with 3 hosts and a shared volumes using GlusterFS

Each node will be hosting Docker running in Swarm mode and a GlusterFS server in cluster mode. Our Swarm will count 2 manager nodes, and 3 worker nodes. As we have seen in the Raft Consensus part, our manager node fault tolerance would be 1. Adding one more manager would be meaningless as the fault tolerance would still be 1. Finally, we will setup a very basic Laravel/Lumen – PHP application and play around with Docker Swarm properties.

Step 1 : install Docker on each host

The installation process is covered in details here : Docker introduction workshop, have a look if you happen to have any question! Here is the full procedure that must be repeated on each instance :

ssh node-1
apt-get update
apt-get -y install apt-transport-https ca-certificates curl gnupg2 software-properties-common
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable"
apt-get update
apt-get -y install docker-ce

Once completed launch the following command to check Docker service status :

systemctl status docker

Step 2 : Initialize the Swarm

The node chosen to initiate the Swarm will be – by default – a manager node. Because we want to build a multi-node Swarm we’ll have to use the advertise address option and specify the host accessible IP address.

docker swarm init --advertise-addr [HOST REACHABLE IP ADDRESS]

The node that initialize the swarm becomes the “leader” as opposed to the “workers” nodes that will join the Swarm later on. By default your leader node will also be part worker, meaning it will be able to run tasks. Options exists if you wish to make it leader only.

Once initialized, we will be adding nodes to our Swarm. In our example the configuration will go as follow :

  • node-1 : manager (leader) node + worker node
  • node-2 : manager node + worker node
  • node-3 : worker node

Step 3 : Add other nodes to the Swarm

Nodes can join the Swarm using a token provided by a manager node already on the Swarm. A manager can issue two types of token : one to invite another manager node in the Swarm, and another to invite worker nodes. To view each of your tokens just run the following command :

$: docker swarm join-token manager
To add a manager to this swarm, run the following command:
    docker swarm join --token SWMTKN-1-00b7ey0kfaki3aon2iccul71vsk3p2ae1xj1npqfx7f1jacky-8g3afymj4vcyarvm2fb84g2hx [host address]:2377
$: docker swarm join-token worker
To add a worker to this swarm, run the following command:
    docker swarm join --token SWMTKN-1-00b7ey0kfaki3aon2iccul71vsk3p2ae1xj1npqfx7f1jacky-8g3afymj4vcyarvm2fb84g2hx [host address]:2377

To follow our setup we will be executing the first outpouted command line from the node-2 and the second one from node-3. Each would respectively print :

This node joined a swarm as a manager.
This node joined a swarm as a worker.

We then make sure that everything is in order by running :

$: docker node list
Fig N : We see our three nodes, two managers with one leader and the – exclusively – worker node.

Before we begin with deploying our application stack, we must address an issue that we described earlier in this article: volume sharing! We are in a distributed environment which means that we have to make volumes accessible to every host without creating local copies that could lead to inconsistent state of the applications over the Swarm. In the next few parts we will be creating a distributed file system – using GlusterFS – in which we’ll store our Docker Volumes. You can choose any other solution to share your volumes. This one happens to be convenient for our example.

Step 4 : Volume Sharing with GlusterFS, a clustered file system

Quick introduction to GlusterFS

Disclaimer : The main objectives of this article is to comprehend and use Docker Swarm. We will be using GlusterFS for our example but won’t cover its inner mechanism in details. If you intend to deploy a solution in production please refer to a proper GlusterFS documentation. For instance we will be creating Gluster Volumes directly on root partition which is – while permitted – not recommended at all.

GlusterFS is a system that aggregates disk storage resources from multiple servers into a single file system. On each node of your cluster are directories called bricks. These bricks will be used by shared volumes to store files. When creating a volume, you can specify a combination of nodes and bricks that will host the volume data.

Fig N : Illustration of GlusterFS mechanism when creating a volume in a cluster

There are several methods used by Gluster volumes to store data, each addressing a specific usage. For instance you can configure volumes like:

  • Distributed Volume : This is the default option. The volume data will be distributed across the bricks. Meaning that each brick will hold part of the volume data. This will be used for file system scaling, where adding nodes increase your storage capacity. Keep in mind that, without data-replication, a brick failure can lead to data loss.
  • Replicated Volume : In replicated mode, the volume data is copied on each brick, thus, preventing data loss on brick fail.
  • Distributed Replicated Volume : Here, the data is replicated on several volumes. It is a combination between the two previous options and will be used for more demanding systems.

Back to our workshop – create a shared file system across the Swarm

Now that we have insights on how GlusterFS Volumes works, we can carry on with the configuration for our Swarm. The general idea is to create a volume accessible through the cluster and to make our Swarm containers mount this volume. This would allow us to replicate changes made by an application container to every other containers using the same application volume.

Now that we have this is mind, we actually have two different approaches here. The first would be to have a volume built on a single brick on one node of the cluster. The other nodes would only be GlusterFS clients and would mount the volume on the host file system. With this scenario our Docker containers running on each Swarm nodes could read and write the shared volume.

Fig 10 : First scenario, a single node hosting the shared volume

While functional, the loss of the node #1 would be critical for the swarm, as Docker won’t be able to build new containers as the volumes used by apps wouldn’t be accessible anymore.

In order to build a more resilient system, we can decide to host multiple GlusterFS servers in the Swarm and create a Replicated GlusterFS Volume that will dispatch duplicated data all over the cluster.

Fig 11 : Second scenario, all Swarm nodes hold a copy of the shared volume data

The specificity of this system is that the hosts hold both roles, Swarm node and GlusterFS Node. A good practice would be to separate these two services and create a distinct FS cluster. There is no need for such architecture here as we only need one shared volume that will store code application & container config files that are usually stored in a Git repository.

Alright, so let’s start the configuration. First we must install our glusterfs servers on each node. To do so, we run the following commands :

$: add-apt-repository ppa:gluster/glusterfs-7
$: apt-get update
$: apt-get -y install glusterfs-server
$: systemctl start glusterd

Check that the installation is complete and that the service is on with systemctl status glusterd.

To ease our setup, we will register each Gluster node IP on each node /etc/hosts file as follow:

# In file /etc/hosts for each Gluster node :
[node-1-ip]        gluster-1
[node-2-ip]        gluster-2
[node-3-ip]        gluster-3

Make sure the name resolution is working fine, try to ping the other Gluster nodes! Now that we have the servers up and running, we must form our Gluster pool! One server is going to invite another Gluster server to join the cluster. The invited node does not need to confirm anything to be part of the cluster. Don’t worry, a remote host cannot become part of your cluster without being invited by a member of the cluster. This part is actually pretty straightforward :

# Run from our node-1 which is referred as gluster-1 to other Gluster hosts
$: gluster peer prob gluster-2;
peer probe: success.
$: gluster peer prob gluster-3;
peer probe: success.

Your gluster pool is now visible from every node in the cluster!

# Run from our node-3
$: gluster pool list
UUID                                    Hostname        State
419efffa-1a5b-4c5a-b65b-dc223a0502d2    gluster-1       Connected
86c18bf3-2623-44bc-8658-e205fcf093d1    gluster-2       Connected
d9443ff2-4a74-428c-b8ac-b3e6598f8057    localhost       Connected

We can now create our first replicated volume! [Again, this will be done in root partition for the sole purpose of demonstration]. The first step is to create a brick in each Gluster node that will be used by the system to store a copy of the volume data :

# From gluster-1
$: mkdir -p /glusterfs/swarm-brick-1
# From gluster-2
$: mkdir -p /glusterfs/swarm-brick-2
# From gluster-3
$: mkdir -p /glusterfs/swarm-brick-3

Once it’s done, let’s create a volume that will be replicated 3 times, one on each gluster node :

$: gluster volume create swarm_volume replica 3 transport tcp \
gluster-1:/glusterfs/swarm-brick-1 \
gluster-2:/glusterfs/swarm-brick-2 \
volume create: swarm_volume: success: please start the volume to access data

Thanks to the lines added in the /etc/hosts file, we can refer to our Gluster nodes using custom Hostnames instead of their respective IP addresses.

⚠ Note : if you are creating the volume on the root partition you will be prompt an error message asking you to add the “force” option at the end of the command line.

As prompted, you can now start your replicated volume :

$: gluster volume start swarm_volume
volume start: swarm_volume: success

If you check the volume status with gluster volume status swarm_volume, you should see something like :

Fig 12 : The volume status output, show the underlying bricks on each node.

Finally, we mount the volume on a host directory. We will be using a folder in /var/www/, but that is up to you! We execute the following commands on each node :

$: mkdir -p /var/www/swarm-data
$: mount.glusterfs localhost:/swarm_volume /var/www/swarm-data

Each node will access its own copy of the volume data. Let’s try it out!

# From node 1
$: cd /var/www/swarm-data && echo "Hello World from node-1!" > test.txt
# From node 2
$: cat /var/www/swarm-data/test.txt
Hello world from node-1!

Every changes to the volume data will be replicated all over our Swarm!

Step 5 : Project configuration

It is time to deploy a very simple application in our Swarm! Our stack will launch two services, a custom php-fpm image that we will need to initialize a Laravel/Lumen project and a nginx service to handle the web-server part. To do so, we create a folder called demo in our swarm-data folder and drop the following files:

# Create the folder that will be used in containers to initialize application
$: mkdir /var/www/swarm-data/demo/app
# file : /var/www/swarm-data/demo/site.conf
server {
    index index.php index.html;
    server_name project;
    error_log /var/log/nginx/error.log;
    access_log /var/log/nginx/access.log;
    root /var/www/app/public;

    # Handles routing
    location / {
        try_files $uri $uri/ /index.php$is_args$args;

    location ~ \.php$ {     
        try_files $uri =404;
        fastcgi_split_path_info ^(.+\.php)(/.+)$;
        fastcgi_pass app:9000;
        fastcgi_index index.php;
        include fastcgi_params;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_param PATH_INFO $fastcgi_path_info;

    # Example rule to disable access to files beginning with ".ht"       
    location ~ /.ht {
        deny all; 
# file : /var/www/swarm-data/demo/docker-compose.app.yml
version: '3'

        image: nginx:latest
            - "8080:80"
            - ./site.conf:/etc/nginx/conf.d/default.conf
            - ./app:/var/www/app

        image: pathiout/lumen8-composer:latest
            - "9000:9000"
            - ./app:/var/www/app

These files can be added from any node in our Swarm since each host /var/www/swarm-data folder is mounted on the same volume! If you are not comfortable with this configuration you can go checkout the examples given in the previous article about Docker basics.

About pathiout/lumen8-composer : I’m using my own Docker Hub registry. This one is public and was made to have the proper environment to create a new Laravel/Lumen project and run composer install.

It is about time that we deploy our stack! As we don’t have any project yet, we will be building one within our app container. This setup phase will populate the volumes for the other containers.

docker stack deploy --compose-file docker-compose.app.yml demoapp

We can check the application status with

docker stack services demoapp

This will output two services running for our demoapp :

  • the app service that holds the Docker Hub image for php-fpm and composer
  • the nginx service : that handles the web part of our application

You should see something like this :

Check the replicas column and make sure that it says 1/1. If you have something like 0/1 you’ve probably run into an error.

How to monitor errors on docker stack deploy ?

A quick solution is to have a look at journalctl:

  • Open a second ssh connexion
  • Run journalctl -f -n10
  • From the first prompt try to deploy your stack again and check for journalctl logs
  • (if your stack is already deployed – even with zero containers – just run : docker stack rm demoapp)

Our stack is now deployed, we can get more information with docker stack ps demoapp

We can see that we have two containers, one is deployed over node-1 and the second one on node-2. At this point we should be able to connect to our Swarm from any host IP on port 8080 – thanks to the service mesh – and get an Nginx error!

Fig 13 : All Swarm nodes return the same content served from the two containers on node 1 and 2.

Good, so now we know that our Swarm is working and that the deployed stack is accessible! Let’s build a small Laravel/Lumen project!

First let’s open a temporary container to install the project from composer command. We’ll mount this container on the /app folder to populate the GlusterFS Volume.

$: docker run \
    --mount type=bind,source="$(pwd)"/app,target=/var/www/app \
    -it pathiout/lumen8-composer:latest bash

Your prompt should change from the host hostname to the container id. You can now proceed to the installation :

$: composer create-project --prefer-dist laravel/lumen app

On installation complete, we can refresh our tabs and see the result from our browser :

Step 6 : Scaling

So far, we have a stack that is composed of two services replicated once. We are now going to scale the php service from 1 container to three. Docker should put one container per host!

# Command is docker service scale [service name]=[number of replicas]
$: docker service scale demoapp_app=3
demoapp_app scaled to 3
overall progress: 3 out of 3 tasks
1/3: running   [==================================================>]
2/3: running   [==================================================>]
3/3: running   [==================================================>]
verify: Service converged

The operation is complete, we run docker service list and see that we now have three replicas of the demoapp_app service!

We can see how Docker dispatched the three containers of the service running docker service ps demoapp _app. As expected we find one container per Swarm node!

We can run some tests: let’s update the content returned by our Laravel/Lumen project to print the hostname as well. Make the modification from any node of your swarm in the /var/www/swarm-data/demo/app/routes/web.php file. Then head back to your browser and refresh the page a couple of times. You’ll see the content changing each time, this is Docker load balancing & Service mesh at work! Each request is handled by a different container in a seamless manner for the end user.

Fig 14 : We can see the hostname change at refresh, meaning it is handled by a different service replica each time.

Conclusion and next steps

The workshop is now complete, we have a solid base for our scalable application environment. You can adjust the number of managers and workers in the swarm as well as adding VMs to the cluster. If you wish to go further you can consider the following :

  • Add a Proxy / Load Balancer as entry point of your cluster : You now have multiple entry-points to your Swarm. You might want to add a Load balancing / Proxy brick in order to group incoming requests and distribute the load over your different Swarm nodes. You could use softwares like HAProxy, Nginx, Traefik.io or else! In addition to that, you’ll be able to configure network rules to isolate your cluster – if not already done.
  • Consider storing user uploads on a remote service : We have created a shared volume for our Swarm to handle configuration and application files, but you must also consider user uploads. I would suggest not storing user uploaded content in the GlusterFS as shown in this tutorial but rather on a third party service outside the Swarm. It could be cloud based like Amazon S3 or your own FS implementation. It can actually be the GluterFS cluster, if you already installed it in a dedicated infrastructure!
  • What about the database ? There is plenty of literature on the web to argue whether dockerize database (software only) or not. Let’s no discuss that here! I would just give this piece of advice : use bind mount for your database and do not rely on docker volumes. Here we have created a GlusterFS cluster that is not critical for our application in a way that it doesn’t holds critical data. Everything that is stored on the shared volume would be gitable. This is not the case for your database storage, you want to be able to retrieve your data whenever you need. Just remember that loosing your application code is like running out of fuel mid-air, while loosing your data is an instant crash. What about scaling ? If you are looking for a clustered database, there are many solution out-of-the-box to deal with scaling before coming to a dockerized platform, keep this in mind when building your architecture!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: