The cloud-init
program that is available on recent distributions (only Ubuntu 14.04 and CentOS 7 at the time of this writing) is able to consume and execute data from the user-data
field of the DigitalOcean metadata service. This process behaves differently depending on the format of the information it finds. One of the most popular formats for scripts within user-data
is the cloud-config file format.
Cloud-config files are special scripts designed to be run by the cloud-init process. These are generally used for initial configuration on the very first boot of a server. In this guide, we will be discussing the format and usage of cloud-config files.
The cloud-config
format implements a declarative syntax for many common configuration items, making it easy to accomplish many tasks. It also allows you to specify arbitrary commands for anything that falls outside of the predefined declarative capabilities.
This “best of both worlds” approach lets the file acts like a configuration file for common tasks, while maintaining the flexibility of a script for more complex functionality.
The file is written using the YAML data serialization format. The YAML format was created to be easy to understand for humans and easy to parse for programs.
YAML files are generally fairly intuitive to understand when reading them, but it is good to know the actual rules that govern them.
Some important rules for YAML files are:
Let’s take these rules and analyze an example cloud-config
file, paying attention only to the formatting:
#cloud-config
users:
- name: demo
groups: sudo
shell: /bin/bash
sudo: ['ALL=(ALL) NOPASSWD:ALL']
ssh-authorized-keys:
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDf0q4PyG0doiBQYV7OlOxbRjle026hJPBWD+eKHWuVXIpAiQlSElEBqQn0pOqNJZ3IBCvSLnrdZTUph4czNC4885AArS9NkyM7lK27Oo8RV888jWc8hsx4CD2uNfkuHL+NI5xPB/QT3Um2Zi7GRkIwIgNPN5uqUtXvjgA+i1CS0Ku4ld8vndXvr504jV9BMQoZrXEST3YlriOb8Wf7hYqphVMpF3b+8df96Pxsj0+iZqayS9wFcL8ITPApHi0yVwS8TjxEtI3FDpCbf7Y/DmTGOv49+AWBkFhS2ZwwGTX65L61PDlTSAzL+rPFmHaQBHnsli8U9N6E4XHDEOjbSMRX user@example.com
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDcthLR0qW6y1eWtlmgUE/DveL4XCaqK6PQlWzi445v6vgh7emU4R5DmAsz+plWooJL40dDLCwBt9kEcO/vYzKY9DdHnX8dveMTJNU/OJAaoB1fV6ePvTOdQ6F3SlF2uq77xYTOqBiWjqF+KMDeB+dQ+eGyhuI/z/aROFP6pdkRyEikO9YkVMPyomHKFob+ZKPI4t7TwUi7x1rZB1GsKgRoFkkYu7gvGak3jEWazsZEeRxCgHgAV7TDm05VAWCrnX/+RzsQ/1DecwSzsP06DGFWZYjxzthhGTvH/W5+KFyMvyA+tZV4i1XM+CIv/Ma/xahwqzQkIaKUwsldPPu00jRN user@desktop
runcmd:
- touch /test.txt
By looking at this file, we can learn a number of important things.
First, each cloud-config
file must begin with #cloud-config
alone on the very first line. This signals to the cloud-init program that this should be interpreted as a cloud-config
file. If this were a regular script file, the first line would indicate the interpreter that should be used to execute the file.
The file above has two top-level directives, users
and runcmd
. These both serve as keys. The values of these keys consist of all of the indented lines after the keys.
In the case of the users
key, the value is a single list item. We know this because the next level of indentation is a dash (-) which specifies a list item, and because there is only one dash at this indentation level. In the case of the users
directive, this incidentally indicates that we are only defining a single user.
The list item itself contains an associative array with more key-value pairs. These are sibling elements because they all exist at the same level of indentation. Each of the user attributes are contained within the single list item we described above.
Some things to note are that the strings you see do not require quoting and that there are no unnecessary brackets to define associations. The interpreter can determine the data type fairly easily and the indentation indicates the relationship of items, both for humans and programs.
By now, you should have a working knowledge of the YAML format and feel comfortable working with information using the rules we discussed above.
We can now begin exploring some of the most common directives for cloud-config
.
To define new users on the system, you can use the users
directive that we saw in the example file above.
The general format of user definitions is:
#cloud-config
users:
- first_user_parameter
first_user_parameter
- second_user_parameter
second_user_parameter
second_user_parameter
second_user_parameter
Each new user should begin with a dash. Each user defines parameters in key-value pairs. The following keys are available for definition:
sh
shell will be used.authorized_keys
file in their .ssh
directory./home/<username>
, which is otherwise created and set./home/<username>
directory for the user.Other than some basic information, like the name
key, you only need to define the areas where you are deviating from the default or supplying needed data.
One thing that is important for users to realize is that the passwd
field should not be used in production systems unless you have a mechanism of immediately modifying the given value. As with all information submitted as user-data, the hash will remain accessible to any user on the system for the entire life of the server. On modern hardware, these hashes can easily be cracked in a trivial amount of time. Exposing even the hash is a huge security risk that should not be taken on any machines that are not disposable.
For an example user definition, we can use part of the example cloud-config
we saw above:
#cloud-config
users:
- name: demo
groups: sudo
shell: /bin/bash
sudo: ['ALL=(ALL) NOPASSWD:ALL']
ssh-authorized-keys:
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDf0q4PyG0doiBQYV7OlOxbRjle026hJPBWD+eKHWuVXIpAiQlSElEBqQn0pOqNJZ3IBCvSLnrdZTUph4czNC4885AArS9NkyM7lK27Oo8RV888jWc8hsx4CD2uNfkuHL+NI5xPB/QT3Um2Zi7GRkIwIgNPN5uqUtXvjgA+i1CS0Ku4ld8vndXvr504jV9BMQoZrXEST3YlriOb8Wf7hYqphVMpF3b+8df96Pxsj0+iZqayS9wFcL8ITPApHi0yVwS8TjxEtI3FDpCbf7Y/DmTGOv49+AWBkFhS2ZwwGTX65L61PDlTSAzL+rPFmHaQBHnsli8U9N6E4XHDEOjbSMRX user@example.com
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDcthLR0qW6y1eWtlmgUE/DveL4XCaqK6PQlWzi445v6vgh7emU4R5DmAsz+plWooJL40dDLCwBt9kEcO/vYzKY9DdHnX8dveMTJNU/OJAaoB1fV6ePvTOdQ6F3SlF2uq77xYTOqBiWjqF+KMDeB+dQ+eGyhuI/z/aROFP6pdkRyEikO9YkVMPyomHKFob+ZKPI4t7TwUi7x1rZB1GsKgRoFkkYu7gvGak3jEWazsZEeRxCgHgAV7TDm05VAWCrnX/+RzsQ/1DecwSzsP06DGFWZYjxzthhGTvH/W5+KFyMvyA+tZV4i1XM+CIv/Ma/xahwqzQkIaKUwsldPPu00jRN user@desktop
To define groups, you should use the groups
directive. This directive is relatively simple in that it just takes a list of groups you would like to create.
An optional extension to this is to create a sub-list for any of the groups you are making. This new list will define the users that should be placed in this group:
#cloud-config
groups:
- group1
- group2: [user1, user2]
For user accounts that already exist (the root
account is the most pertinent), a password can be suppled by using the chpasswd
directive.
Note: This directive should only be used in debugging situations, because, once again, the value will be available to every user on the system for the duration of the server’s life. This is even more relevant in this section because passwords submitted with this directive must be given in plain text.
The basic syntax looks like this:
#cloud-config
chpasswd:
list: |
user1:password1
user2:password2
user3:password3
expire: False
The directive contains two associative array keys. The list
key will contain a block that lists the account names and the associated passwords that you would like to assign. The expire
key is a boolean that determines whether the password must be changed at first boot or not. This defaults to “True”.
One thing to note is that you can set a password to “RANDOM” or “R”, which will generate a random password and write it to /var/log/cloud-init-output.log
. Keep in mind that this file is accessible to any user on the system, so it is not any more secure.
In order to write files to the disk, you should use the write_files
directive.
Each file that should be written is represented by a list item under the directive. These list items will be associative arrays that define the properties of each file.
The only required keys in this array are path
, which defines where to write the file, and content
, which contains the data you would like the file to contain.
The available keys for configuring a write_files
item are:
For example, we could write a file to /test.txt
with the contents:
Here is a line.
Another line is here.
The portion of the cloud-config
that would accomplish this would look like this:
#cloud-config
write_files:
- path: /test.txt
content: |
Here is a line.
Another line is here.
To manage packages, there are a few related settings and directives to keep in mind.
To update the apt database on Debian-based distributions, you should set the package_update
directive to “true”. This is synonymous with calling apt-get update
from the command line.
The default value is actually “true”, so you only need to worry about this directive if you wish to disable it:
#cloud-config
package_update: false
If you wish to upgrade all of the packages on your server after it boots up for the first time, you can set the package_upgrade
directive. This is akin to a apt-get upgrade
executed manually.
This is set to “false” by default, so make sure you set this to “true” if you want the functionality:
#cloud-config
package_upgrade: true
To install additional packages, you can simply list the package names using the “packages” directive. Each list item should represent a package. Unlike the two commands above, this directive will function with either yum or apt managed distros.
These items can take one of two forms. The first is simply a string with the name of the package. The second form is a list with two items. The first item of this new list is the package name, and the second item is the version number:
#cloud-config
packages:
- package_1
- package_2
- [package_3, version_num]
The “packages” directive will set apt_update
to true, overriding any previous setting.
You can manage SSH keys in the users
directive, but you can also specify them in a dedicated ssh_authorized_keys
section. These will be added to the first defined user’s authorized_keys file.
This takes the same general format of the key specification within the users
directive:
#cloud-config
ssh_authorized_keys:
- ssh_key_1
- ssh_key_2
You can also generate the SSH server’s private keys ahead of time and place them on the filesystem. This can be useful if you want to give your clients the information about this server beforehand, allowing it to trust the server as soon as it comes online.
To do this, we can use the ssh_keys
directive. This can take the key pairs for RSA, DSA, or ECDSA keys using the rsa_private
, rsa_public
, dsa_private
, dsa_public
, ecdsa_private
, and ecdsa_public
sub-items.
Since formatting and line breaks are important for private keys, make sure to use a block with a pipe key when specifying these. Also, you must include the begin key and end key lines for your keys to be valid.
#cloud-config
ssh_keys:
rsa_private: |
-----BEGIN RSA PRIVATE KEY-----
your_rsa_private_key
-----END RSA PRIVATE KEY-----
rsa_public: your_rsa_public_key
If your infrastructure relies on keys signed by an internal certificate authority, you can set up your new machines to trust your CA cert by injecting the certificate information. For this, we use the ca-certs
directive.
This directive has two sub-items. The first is remove-defaults
, which, when set to true, will remove all of the normal certificate trust information included by default. This is usually not needed and can lead to some issues if you don’t know what you are doing, so use with caution.
The second item is trusted
, which is a list, each containing a trusted CA certificate:
#cloud-config
ca-certs:
remove-defaults: true
trusted:
- |
-----BEGIN CERTIFICATE-----
your_CA_cert
-----END CERTIFICATE-----
If you have configured your own DNS servers that you wish to use, you can manage your server’s resolv.conf file by using the resolv_conf
directive. This currently only works for RHEL-based distributions.
Under the resolv_conf
directive, you can manage your settings with the nameservers
, searchdomains
, domain
, and options
items.
The nameservers
directive should take a list of the IP addresses of your name servers. The searchdomains
directive takes a list of domains and subdomains to search in when a user specifies a host but not a domain.
The domain
sets the domain that should be used for any unresolvable requests, and options
contains a set of options that can be defined in the resolv.conf file.
If you are using the resolv_conf
directive, you must ensure that the manage-resolv-conf
directive is also set to true. Not doing so will cause your settings to be ignored:
#cloud-config
manage-resolv-conf: true
resolv_conf:
nameservers:
- 'first_nameserver'
- 'second_nameserver'
searchdomains:
- first.domain.com
- second.domain.com
domain: domain.com
options:
option1: value1
option2: value2
option3: value3
If none of the managed actions that cloud-config
provides works for what you want to do, you can also run arbitrary commands. You can do this with the runcmd
directive.
This directive takes a list of items to execute. These items can be specified in two different ways, which will affect how they are handled.
If the list item is a simple string, the entire item will be passed to the sh
shell process to run.
The other option is to pass a list, each item of which will be executed in a similar way to how execve
processes commands. The first item will be interpreted as the command or script to run, and the following items will be passed as arguments for that command.
Most users can use either of these formats, but the flexibility enables you to choose the best option if you have special requirements. Any output will be written to standard out and to the /var/log/cloud-init-output.log
file:
#cloud-config
runcmd:
- [ sed, -i, -e, 's/here/there/g', some_file]
- echo "modified some_file"
- [cat, some_file]
In some cases, you’ll want to shutdown or reboot your server after executing the other items. You can do this by setting up the power_state
directive.
This directive has four sub-items that can be set. These are delay
, timeout
, message
, and mode
.
The delay
specifies how long into the future the restart or shutdown should occur. By default, this will be “now”, meaning the procedure will begin immediately. To add a delay, users should specify, in minutes, the amount of time that should pass using the +<num_of_mins>
format.
The timeout
parameter takes a unit-less value that represents the number of seconds to wait for cloud-init to complete before initiating the delay
countdown.
The message
field allows you to specify a message that will be sent to all users of the system. The mode
specifies the type of power event to initiate. This can be “poweroff” to shut down the server, “reboot” to restart the server, or “halt” to let the system decide which is the best action (usually shutdown):
#cloud-config
power_state:
timeout: 120
delay: "+5"
message: Rebooting in five minutes. Please save your work.
mode: reboot
The above examples represent some of the more common configuration items available when running a cloud-config
file. There are additional capabilities that we did not cover in this guide. These include configuration management setup, configuring additional repositories, and even registering with an outside URL when the server is initialized.
You can find out more about some of these options by checking the /usr/share/doc/cloud-init/examples
directory. For a practical guide to help you get familiar with cloud-config
files, you can follow our tutorial on how to use cloud-config to complete basic server configuration here.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
The config sample under “Run Arbitrary Commands for More Control” uses “run_cmd” while stating that the command is “runcmd” and is used as “runcmd” in other sections of the tutorial.
I believe it should read as “runcmd” in the example as well.
#cloud-config ssh_keys: rsa_private: | -----BEGIN RSA PRIVATE KEY----- your_rsa_private_key -----END RSA PRIVATE KEY-----
rsa_public: your_rsa_public_key
wouldn’t this leave the private key exposed?
As a note, it appears that at least for yum-based distributions, any failed dependency for any package in the ‘packages’ section will result in failure to install any of the packages.
It would be nice if it would just fail to install those with unmet dependencies and log those to the cloud-init.log file.
By looking at runcmd code (/usr/lib/python2.6/site-packages/cloudinit/config/cc_runcmd.py) I noticed there’s no ‘frequency’ specified in comparison to other. Beside the only thing that scripts does is to save the scripts given as shells script under /var/lib/cloud/instance/scripts/runcmd .
So if I verbosely specify the modules, I MUST set the frequency.
and the scripts created by this particular can be run by ‘scripts-user’, so I need to specify i.e.
to make runcmd scripts working
Is it proper way to do so ? The documentation (at least current 0.7.7) lacks proper explanation of runcmd and user-scripts and how they can be utilized
I also don’t understand difference between all the mode, once (first time the instance boot), instance(???), always ( that I understand), ???boot(that one exist? it doesn’t seem to work…
And how do we “hash” the password?
This comment has been deleted
Great stuff!
Before anything, one must know i am fully aware of api calls and “#!/bin/bash” script capabilities of the ‘User Data’ box on the droplet creation page.
Wondering still… any of you folks tried this method and get something to work? Am i missing something here? Steps taken just to test…
I have done piles of tests, that one being the most basic in my opinion. I believe i have followed what seems to me like pretty basic guidelines but darnit, can’t get any to work (writefile, sshkeys, users…).
Works fine using something like;
Basic droplet created for tests;
I have not tried creating a drop in/on another region/server (darn french-canadian me!). Agreed it would have saved me some writing time if it was only that but still, looking for your input on the subject.
Thanks for helping
Is there a way to get user input when the server starts. Lets say I am testing several cloud images and I don’t want to have to edit the cloud-config every time. can I prompt for user input when the server comes up?
I noticed in the section about packages that it says “The “packages” directive will set
apt_update
to true” and notpackage_update
.Thought I’d give cloud-init a whirl and after two failed attempts I realized my editor was using TAB instead of SPACES. Would be nice to just add this tidbit spelled out in the tutorial to minimize undo frustration.