Taming the Docker Swarm - Part 1

Over the past few months I have been playing around with Docker Swarm mode. I wanted to see if I could get various services up and running within my home network and whether this would translate through to real world scenarios using multiple cloud providers.

I am happy to say that by using some different software, such as Traefik, changing the responsibility of services and updating my local domain settings I have come up with a practical and usable solution for running multiple services in Docker Swarm.

This post discusses the foundation for getting all of this up and running and running one service which provides a web based portal for managing the services. I will write a following up post that shows how to deploy other services, such as ElasticSearch into the cluster.

Overview

At Chef I get to work with a myriad of different technologies and as a result I need to be able to spin things up quickly and reliably. In the past I have used virtual machines, but these can take time to provision and get running by which time I have lost focus on what I needed to do.

The main focus for setting up a local Docker Swarm cluster was because I was working on the VSTS Chef Extension which required a private agent to build. At first I created a VM to run the agent, and then I ran a Docker container on a single node which was much easier but it did not have the ability to scale well. So then I started to look at Docker Swarm mode.

I had some criteria that I needed to satisfy:

  • Run Docker Swarm mode on 3 nodes
  • Simple configuration for load balancing multiple replicas
  • Enable SSL on some services
  • Able to proxy sites for specific URLs

I will go through each of these criteria in turn; at the end of which I had a working Docker Swarm cluster that I could deploy any service to.

In order to get this working I created a new Chef Cookbook called docker-swarmm. This provides some resources that can be used by other cookbooks to create a cluster and deploy services to it. It is in my own repo at the moment as it is still heavily under development so use at own risk.

Hardware Configuration

On my home office network I have 3 Ubuntu machines that I wanted to take part in the swarm cluster. I also have a Windows machine running Docker for Windows that I am experimenting with to see how well it can integrate with the swarm.

As can be seen from the diagram above there is a docker overlay network which allows all the services to connect to each other.

Machine Role
Node 1 Master
Node 2 Worker
Node 3 Worker
Node 4 Worker

In the current version of docker, 17.06.0-ce, it is not possible for Windows Docker nodes to join as a master, but it is possible to join as a worker node. Most of the commands here refer to running Docker on Ubuntu. Specific callouts will be made when they need to be run on the Windows machine.

All of the machines have had Docker installed on them using the Chef Cookbook docker which includes a resource called docker_service to download, install and configure Docker to run as a local service.

Do not confuse the docker_service resource with the docker service command for Docker Swarm.

The using the docker_swarm_master resource from my docker_swarmm cookbook, “Node 1” was configured as the master server.

As I have already mentioned the docker_swarmm cookbook is still very alpha and as such does not have the ability to join worker nodes to the cluster. Unfortunately at the moment this means that some manual configuration is required.

Add nodes to the cluster

On the manager node, “Node 1” run the following to get the join token:

$> docker swatm join-token worker

This will produce a string similar to the following:

To add a worker to this swarm, run the following command:

   docker swarm join \
   --token SWMTKN-1-5gf0lkjfpojoergpikjmiojergojjhg765kjnd \
   192.168.36.10:2377

Run this command on the other nodes to create a docker swarm cluster. Once done, on the master node run the following:

$> docker node ls

This will show the status of the cluster, which will be similar to the following:

ID                           HOSTNAME        STATUS  AVAILABILITY  MANAGER STATUS
2n8zh3iyw0qah76sjksngrymq    node-3          Ready   Active
bwlqfmqn7u8rt0dssgyj83vxb    node-2          Ready   Active
qijulxfs0e5w4ubcfiky5qhbf *  node-1          Ready   Active        Leader

Load Balancer

At first I tried to use Nginx to perform proxy and load balancing tasks in the cluster and whilst I was getting there using
tools such as nginx-proxy they were not as dynamic as I wanted it to be. I then stumbled upon Traefik which has to run on a master node and uses labels set on the services to determine how traffic should be routed to the service. As an added benefit it supports Let’s Encrypt out of the box to get valid SSL certificates for you services.

This does not make Nginx redundant, indeed I am using Nginx within the cluster to provide basic authentication to my internal Docker repository with Traefik doing the SSL termination for me.

Please refer to Using Let's Encrypt for Internal Servers for information about how I configured my local DNS so that I was able to get Let’s Encrypt to give me valid SSL certificates for my internal machines

Configuration

Traefik uses a Yaml file for its configuration. The following shows an example of the one I am using:

logLevel = "DEBUG"

defaultEntryPoints = ["http", "https"]

[entryPoints]
   [entryPoints.http]
   address = ":80"
      [entryPoints.http.redirect]
      entryPoint = "https"
   [entryPoints.https]
   address = ":443"
      [entryPoints.https.tls]


# Enable ACME (Let's Encrypt) automate SSL
[acme]
email = "russell.seymour@turtlesystems.co.uk"
storage = "/etc/traefik/acme.json"
dnsProvider = "route53"
entryPoint = "https"
onDemand = true
OnHostRule = true

# Allow access to the Web UI
[web]
address = ":8080"

# Configure how docker will be run
[docker]
endpoint = "unix://var/run/docker.sock"
domain = "traefik"
watch = true
exposedbydefault = false
swarmmode = true

This sample is OK to use internally but do not use it on an Internet facing machine as there is authentication or encryption for the web UI.

This configuration file should be written to the host so that it can be mounted in the service. This means that Chef could be used to write out the configuration file based on attributes for example. For the sake of this post the file is saved to /data/dockerswarm/rproxy/etc/traefik.yml.

People will notice that this is not a usual path within Linux and you would be correct. However I wanted persistent storage so I mount an NFS share to /data/dockerswarm.

  • logLevel - set at which level logging should be set. Debug give a lot of information so turn things down when it is all working as desired
  • defaultEntryPoints - what entry points Traefik will listen on. I this case 80 and 443
  • Lines 5 - 12 tell Traefik to redirect any traffic on port 80 to port 443 and that we want to use TLS on the HTTPS entrypoint

Lets Encrypt

Notice that there are no declarations of SSL certificates or keys, this is because Lets Encrypt will be doing this for the cluster. This is all set in the [acme] section of the configuration file.

Name Description
email Email account to use with Lets Encrypt when obtaining a certificate
storage Where to store the certificates when they are generated. This is from within the service (container)
dnsProvider The DNS provider to use to verify that you have control of the domain
entryPoint Entrypoint to proxy acme challenge/apply certificates to
onDemand This will request a certificate from Let’s Encrypt during the first TLS handshake for a hostname that does not yet have a certificate
OnHostRule Enable certificate generation on frontends Host rules. This will request a certificate from Let’s Encrypt for each frontend with a Host rule

This configuration requires the use of the DNS challenge for Let’s Encrypt. Please refer to the post Using Let’s Encrypt for Internal Servers for more information.

Web UI

Traefik provides a web UI to show what has been deployed and which services are being load balanced by the software.

Name Description
address Address that the service should listen on. In the exmaple it means list on all IPs on port 8080

There are many more options that can be specified here to provide SSL and basic authentication. Unfortunately this part of the software doe snot support the Let’s Encrypt method for getting a certificate.

Docker Configuration

Traefik supports many different orchestration systems so it has to be told which one it will be using. The [docker] section tells it to use Docker.

Name Description
endpoint The Docker engine endpoint. It is possible to make this use a TCP endpoint if so required, but the Docker engine must be enabled to listen on a TCP endpoint. As Traefik runs on the master node anyway this is probably mute
domain Domain to be used when the services are created
watch Enable Traefik to watch changes to docker, e.g. new services
exposedbydefault Enable to publish all services in the cluster by default. Labels on the services will be used to tell Traefik whether or not to expose the service
swarmmode Tell the system that Docker Swarm mode is being used

For more information on the settings that can be used please look at the Traefik Documentation.

Run the Load Balancer

Docker Swarm mode provides service discovery for services within the swarm cluster which means that the name of the service can be used as the DNS lookup. However this only works when machines are connected to overlay networks, so a new one should be created, for example:

$> docker network create traefik-net

Other options, such as creating an encrypted network or setting the network range can be specified. Please refer to Docker Create Network for more information.

Now finally run the service.

$> docker service create --name reverse_proxy \
                         --network traefik-net \
                         --mount "type=bind,src=/data/dockerswarm/rproxy/etc,dst=/etc/traefik" \
                         --mount "type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock" \
                         --constraint "node.role == manager" \
                         --publish "80:80" \
                         --publish "443:443" \
                         --publish "8080:8080" \
                         --env "AWS_ACCESS_KEY=xxxxxxx" \
                         --env "AWS_SECRET_ACCESS_KEY=xxxxxxx" \
                         traefik

This rather long winded command will create one Traefik replica service on the manage node, so in this case “Node 1”. It has to run on the manager node so that it sees any new services that are launched from it. The mounts are there to give access to the configuration file and the Docker socker so that the changes are seen.

The two environment variables for AWS are required as these are the keys for contacting your AWS Route 53 service for creating new SSL certificates for new services.

Once this is up and running you should be able to access the Web UI on port 8080 on the host, which at this point should just have a Docker tab and no entries.

Creating Services

Creating services is no more complicated now that the load balancer is in place, it just requires some labels if you wish the service to be exposed by Traefik. Traefik’s ability to automatically expose new services has been disabled in the configuration file. This is useful as not all services can got through an HTTP load balancer, think Openldap for example.

I like to be able to monitor the service that are running in the cluster and to this end the first service to be deployed is Portainer. This application is run as a service itself and connects to the Docker endpoint on the the manager node. If Docker has been enabled with TCP connectivity on the worker nodes it can connect to these remotely as well.

Portainer

As with Traefik, Portainer needs to be able to connect to the manager node Docker endpoint so this will be a bind mount on the creation of the service.

$> docker service create --name portainer \
                         --network traefik-net \
                         --constraint "node.role == manager" \
                         --mount "type=bing,src=/var/run/docker.sock,dst=/var/run/docker.sock" \
                         --label "traefik.enable=true" \
                         --label "traefik.port=9000" \
                         --label "traefik.docker.network=traefik-net" \
                         --label "traefik.frontend.rule=Host:node1;PathPrefixStrip:/portainer" \
                         --label "traefik.backend=portainer" \
                         portainer/portainer \
                         --admin-password P@ssw0rd!

There is a lot going on in this command. Portainer, by default, runs on port 9000 but it is not being exposed. This is because Traefik will proxying the requests on port 443 so the URL will be https://node1/portainer/.

The numerous labels on the service are how Traefik can dynamically configure itself when new services are added.

Label Description
traefik.enable Whether or not Traefik should expose this service
traefik.port The port on which Traefik should direct traffic to on the service
traefik.docker.network Network that Traefik should use to connect to the service
traefik.frontend.rule Set rule that will trigger routing the request to this service
traefik.backend The name of the generated backend for this service

With these labels and the SSL configuration Traefik should obtain a certificate for this service.

The following screen shot shows the Portainer dashboard for my local cluster. It has many more services running on it some of which I will cover in the next part of this post.

Summary

After working on this off and on for the past few weeks I now have a stable cluster that I can easily deploy services to. Chef is managing the nodes and I have a wrapper cookbook that deploys services for me. It has really made things much easier to test as I know that I can spin up and destroy software easily.

There are things that I have added such as persistent storage which works but possibly not as robust as it should be. I will cover this in the next part of this post.

Share Comments