In this tutorial, we will demonstrate how to use DigitalOcean API to horizontally scale your server setup. To do this, we will use DOProxy, a relatively simple Ruby script that, once configured, provides a command line interface to scale your HTTP application server tier up or down.
DOProxy, which was written specifically for this tutorial, provides a simple way for creating and deleting application server droplets, using the DigitalOcean API, and automatically managing them behind an HAProxy load balancer. This basic scaling model allows your users to access your application through the HAProxy server, which will forward them to the load-balanced, backend application servers.
DOProxy performs three primary functions:
Note: The primary purpose of this tutorial is to teach the minimally required concepts necessary to programmatically scale your DigitalOcean server architecture through the API. You should not run DOProxy, in its current form, in a production environment; it was not designed with resiliency in mind, and it performs just enough error checking to get by. With that said, if you are curious about learning about horizontal scaling through the API, it’s a great way to get started.
This tutorial touches on a variety of technologies that you might want to read up on before proceeding, including:
Because DOProxy is written in Ruby, knowledge of Ruby is a plus but not necessary; we will provide pseudocode to explain the gist of the DOProxy code. Also, we will use the official DigitalOcean Ruby wrapper, DropletKit, which enables us to easily make API calls in our Ruby code.
Before we get into the details of DOProxy works, we will install and use it on a server. Let’s install DOProxy on an Ubuntu 14.04 droplet now.
First, create an Ubuntu 14.04 droplet in the NYC3 region (you may use any region that supports private networking and userdata if you configure the region
variable in the doproxy.yml
file after installing DOProxy). This droplet will run the HAProxy load balancer, and the DOProxy scaling script, so choose a size that you think will be adequate for your desired scale potential. Because this tutorial is a basic demonstration of scaling, which won’t receive any real traffic, the 1GB size is probably adequate.
We will refer to this droplet as the DOProxy server.
Next, log in and follow the Installation and Configuration (including doproxy config and Userdata) sections in the DOProxy GitHub repository to install DOProxy on this server. Use the sample doproxy.yml
and user-data.yml
files by copying them, as noted in the directions. Be sure to replace the token
and ssh_key_ids
values in the DOproxy configuration file, or the script will not work.
Now that you have DOProxy and HAProxy installed on your server, let’s try and scale our environment.
Log in to your DOProxy server as root, and change to the directory where you cloned DOProxy, if you haven’t done so already.
Now run DOProxy without any arguments:
ruby doproxy.rb
This should print out the available commands like so:
Commands:
doproxy.rb print # Print backend droplets in inventory file
doproxy.rb create # Create a new backend droplet and reload
doproxy.rb delete <LINE_NUMBER> # Delete a droplet and reload
doproxy.rb reload # Generate HAProxy config and reload HAProxy
doproxy.rb generate # Generate HAProxy config based on inventory
Currently, DOProxy hasn’t created any droplets. Let’s create some to get our HTTP service online, and scale up.
Run the create command to create the first droplet that is managed by DOProxy:
ruby doproxy.rb create
This will take some time before returning to the prompt (because the script creates a new droplet via the API and waits for it to boot up). We’ll talk about how the API call is made, later, when we go through the DOProxy code.
When it is complete, you should see a success message that contains the droplet ID, like so:
Success: 4202645 created and added to backend.
If you visit your DOProxy server’s public IP address in a web browser. You should see a page that lists your new droplet’s hostname, id, and public IP address.
We’ll use DOProxy to create two more droplets, for a total of three. Feel free to create more if you want:
ruby doproxy.rb create
ruby doproxy.rb create
Now visit your DOProxy server’s public IP address in a web browser again. If you refresh the page, you will notice that the information on the page will change—it will cycle through the droplets that you created. This is because they are all being load balanced by HAProxy—each droplet is added to the load balancer configuration when it is created.
If you happen to look in the DigitalOcean Control Panel, you will notice that these new droplets will be listed there (along with the rest of your droplets):
Let’s take a closer look at the droplets that were created by looking at DOProxy’s inventory.
DOProxy provides a print command, that will print out all of the droplets that are part of its inventory:
ruby doproxy.rb print
You should see output that looks something like this:
0) auto-nginx-0 (pvt ip: 10.132.224.168, status: active, id: 4202645)
1) auto-nginx-1 (pvt ip: 10.132.228.224, status: active, id: 4205587)
2) auto-nginx-2 (pvt ip: 10.132.252.42, status: active, id: 4205675)
In the example output, we see information about the three droplets that we created, such as their hostnames, status, and droplet IDs. The hostnames and IDs should match what you saw when you accessed the HAProxy load balancer (via DOProxy’s public IP address).
As you may have noticed, DOProxy only printed information about droplets that it created. This is because it maintains an inventory of the droplets it creates.
Check out the contents of the inventory
file now:
cat inventory
You should see the ID of each droplet, one per line. Each time a droplet is created, its ID is stored in this inventory file.
As you may have guessed, DOProxy’s print
command iterates through the droplet IDs in the inventory file and performs an API call to retrieve droplet information about each one.
It should be noted that storing your server inventory in a single file is not the best solution—it can easily be corrupted or deleted—but it demonstrates a simple implementation that works. A distributed key value store, such as etcd, would be a better solution. You would also want to save more than just the droplet ID in the inventory (so you don’t have to make API calls every time you want to look at certain droplet information).
DOProxy also has a delete command that lets you delete droplets in your inventory. The delete command requires that you provide the line number of the droplet to delete (as displayed by the print
command).
Before running this command you will probably want to print your inventory:
ruby doproxy.rb print
So, for example, if you want to delete the third droplet, you would supply 2
as the line number:
ruby doprorxy.rb delete 2
After a moment, you’ll see the confirmation message:
Success: 4205675 deleted and removed from backend.
The delete command deletes the droplet via the API, removes it from the HAProxy configuration, and deletes it from the inventory. Feel free to verify that the droplet was deleted by using the DOProxy print command or by checking the DigitalOcean control panel. You will also notice that it is no longer part of the load balancer.
The last piece of DOProxy that we haven’t discussed is how HAProxy is configured.
When you run the create
or delete
DOProxy command, the information of each droplet in the inventory is retrieved, and some of the information is used to create an HAProxy configuration file. In particular, the droplet ID and private IP address is used to add each droplet as a backend server.
Look at the last few lines of the generated haproxy.cfg
file like this:
tail haproxy.cfg
You should see something like this:
frontend www-http
bind 104.236.236.43:80
reqadd X-Forwarded-Proto:\ http
default_backend www-backend
backend www-backend
server www-4202645 10.132.224.168:80 check # id:4202645, hostname:auto-nginx-0
server www-4205587 10.132.228.224:80 check # id:4205587, hostname:auto-nginx-1
The frontend
section should contain the public IP address of your DOProxy server, and the backend
section should contain lines that refer to each of the droplets that were created.
Note: At this point, you may want to delete the rest of the droplets that were created with DOProxy (ruby doproxy.rb delete 0
until all of the servers are gone).
Now that you’ve seen DOProxy’s scaling in action, let’s take a closer look at the code.
In this section, we will look at the pertinent files and lines of code that make DOProxy work. Seeing how DOProxy was implemented should give you some ideas of how you can use the API to manage and automate your own server infrastructure.
Since you cloned the repository to your server, you can look at the files there, or you can look at the files at the DOProxy repository (https://github.com/thisismitch/doproxy).
Important files:
Let’s dive into the configuration files first.
The important lines in the DOProxy configuration file, doproxy.yml
, are the following:
token: 878a490235d53e34b44369b8e78
ssh_key_ids: # DigitalOcean ID for your SSH Key
- 163420
...
droplet_options:
hostname_prefix: auto-nginx
region: nyc3
size: 1gb
image: ubuntu-14-04-x64
The token
is where you can configure your read and write API token.
The other lines specify the options that will be used when DOProxy creates a new droplet. For example, it will install the specified SSH key (by ID or fingerprint), and it will prefix the hostnames with “auto-nginx”.
More information about valid droplet options, check out the DigitalOcean API documentation.
The userdata file, user-data.yml
, is a file that will be executed by cloud-init on each new droplet, when it is created. This means that you can supply a cloud-config file or a script to install your application software on each new droplet.
The sample userdata file contains a simple bash script that installs Nginx on an Ubuntu server, and replaces its default configuration file with the droplet hostname, ID, and public IP address:
#!/bin/bash
apt-get -y update
apt-get -y install nginx
export DROPLET_ID=$(curl http://169.254.169.254/metadata/v1/id)
export HOSTNAME=$(curl -s http://169.254.169.254/metadata/v1/hostname)
export PUBLIC_IPV4=$(curl -s http://169.254.169.254/metadata/v1/interfaces/public/0/ipv4/address)
echo Droplet: $HOSTNAME, ID: $DROPLET_ID, IP Address: $PUBLIC_IPV4 > /usr/share/nginx/html/index.html
The droplet information (hostname, ID, and IP address) are retrieved through the DigitalOcean Metadata service—that’s what those curl
commands are doing.
Obviously, you would want to do something more useful than this, like install and configure your application. You can use this to automate the integration of your droplets into your overall infrastructure, by doing things like automatically installing SSH keys and connecting to your configuration management or monitoring tools.
To read more about userdata, cloud-config, and metadata, check out these links:
The HAProxy configuration template, haproxy.cfg.erb
, contains most of the load balancer configuration, with some Ruby code that will be replaced with backend droplet information.
We’ll just look at the Ruby section that generates the backend configuration:
backend www-backend
<% @droplets.each_with_index do |droplet, index| %>
server www-<%= droplet.id %> <%= droplet.private_ip %>:80 check # id:<%= droplet.id %>, hostname:<%= droplet.name -%>
<% end %>
This code iterates through each of the droplets in the inventory, and adds a new HAProxy backend for each one (based on the private IP address).
For example, a line like this will be produced for each droplet:
server www-4202645 10.132.224.168:80 check # id:4202645, hostname:auto-nginx-0
Whenever a droplet is created or deleted, DOProxy generates a new HAProxy configuration file—the haproxy.cfg
file that you looked at earlier.
The DOProxy Ruby script, doproxy.rb
, consists mainly of a DOProxy class that contains the methods that perform the droplet creation and deletion, inventory management, and HAProxy configuration generation.
If you understand Ruby, check out the file on GitHub: https://github.com/thisismitch/doproxy/blob/master/doproxy.rb.
If you don’t understand Ruby, here is some simplified pseudocode that explains each method. It may be useful to reference this against the actual Ruby code, to help you understand what is happening.
Executed every time DOProxy runs, unless no arguments are specified.
doproxy.yml
configuration file (get API token, and droplet options). 2ified.Retrieves information for each droplet in the inventory file. It must be executed before any of the following methods are executed.
When the “doproxy.rb print” command is used, prints droplet information to the screen. It relies on get_inventory
.
get_inventory
)When the “doproxy.rb create” command is used, creates a new droplet and adds it to the inventory file, then calls reload_haproxy
to generate HAProxy configuration and reload the load balancer.
reload_haproxy
to generate HAProxy configuration and reload the load balancerWhen the “doproxy.rb delete” command is used, deletes the specified droplet and deletes its ID from the inventory file, then calls reload_haproxy
to generate HAProxy configuration and reload the load balancer.
reload_haproxy
to generate HAProxy configuration and reload the load balancerThis is a supporting method that creates new HAProxy configuration files based on the droplets in the inventory.
haproxy.cfg.erb
haproxy.cfg
file to diskThis is a supporting method that copies the HAProxy configuration file into the proper location, and reloads HAProxy. This relies on generate_haproxy_cfg
.
haproxy.cfg
to the location where HAProxy will read on reloadThat’s all of the important code that makes DOProxy work. The last thing we will discuss is DropletKit, the API wrapper that we used in DOProxy.
DOProxy uses the DropletKit gem, the official DigitalOcean API v2 Ruby wrapper, to make DigitalOcean API calls. DropletKit allows us to easily write Ruby programs that do things like:
This tutorial focused on these particular API endpoints, but keep in mind that there are many other endpoints that can help facilitate programmatic management your DigitalOcean server infrastructure.
Now that you’ve seen how a simple script can help scale a server environment, by leveraging the DigitalOcean API, cloud-config, and metadata, hopefully you can apply these concepts to scale your own server setup. Although DOProxy isn’t production ready, it should give you some ideas for implementing your own scaling solution.
Remember that the scaling setup described here, with DOProxy, is great but it could be greatly improved by using it in conjunction with a monitoring system. This would allow you to automatically scale your application server tier up and down, depending on certain conditions, such as server resource utilization.
Have any questions or comments? Feel free to post them below!
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Great!
Interesting tutorial! Great work!
This looks great! :) But a couple of months back I decided to go with Nginx over HAProxy, although HAProxy has far greater load balancing capabilities than Nginx, the lack of great support for multiple domains served over SPDY, let to my decision about just to rely on Nginx’s rather limited load balancing instead.
Does anyone know if HAProxy supports HTTP/2 just yet, or maybe know some articles to poke me in the right direction ?
Also is it still only possible to eliminate the Single Point of Failure of the DOProxy Server, in this case, through DNS balancing ?
This comment has been deleted
Thanks
Nice one! Few months a go I had to develop the same thing.
Hi, thanks a lot for your article.
I like the way you put together several pieces of technologies to automate scaling, although I think there is one essential step missing in this tutorial and the autoscaling/downscaling concept cannot be achieved without it: feedback.
Auto/Down scaling should be triggered from an alerting and monitoring system that once the architecture reaches a certain upper/lower threshold (regarding memory, load, etc.) it will notify the scaling system to do his job. Moreover in the current setup you describe is actually difficult to build a heterogeneous scaling system based on droplet of different sizes and types (even spanning across regions).
Interested to hear from you regarding those thoughts. Cheers, Tiziano
Hello, @manicas ! Great article but if I understand right, it’s for manual scaling, isn’t it? How can I do real auto scaling with these tools? Example: I have 1 project and wan’t to autoscale it on the high loads. In this way, I need autocreate and autoremove droplets on the go. How can I do this?
Congratulations on the article. Has anyone ever seen something about autoscaling in Digitalocean with NGINX? I want to use Autoscaling with Load balancer in NGINX. Thanks!
Awesome article