Skip to content

mrezk/cookbook-elasticsearch

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

This Chef cookbook installs and configures the Elasticsearch search engine on a Linux compatible operating system.

It requires a working Java installation on the target node; add your preferred java cookbook to the node run_list.

The cookbook downloads the elasticsearch tarball (via the ark provider), unpacks and moves it to the directory you have specified in the node configuration (/usr/local/elasticsearch by default).

It installs a service which enables you to start, stop, restart and check status of the elasticsearch process.

If you include the elasticsearch::monit recipe, it will create a configuration file for Monit, which will check whether elasticsearch is running, reachable by HTTP and the cluster is in the "green" state. (Assumed you have included a compatible "monit" cookbook in your run list first.)

If you include the elasticsearch::aws recipe, the AWS Cloud Plugin will be installed on the node, allowing you to use the Amazon AWS-related features (node auto-discovery, etc). Set your AWS credentials either in the "elasticsearch/aws" data bag, or directly in the role/node configuration. Instead of using AWS access tokens, you can create the instance with a IAM role.

If you include the elasticsearch::data and elasticsearch::ebs recipes, an EBS volume will be automatically created, formatted and mounted so you can use it as a local gateway for Elasticsearch. When the EBS configuration contains a snapshot_id value, it will be created with data from the corresponding snapshot. See the attributes/data file for more information.

If you include the elasticsearch::proxy recipe, it will configure the Nginx server as a reverse proxy for Elasticsearch, so you may access it remotely with HTTP authentication. Set the credentials either in a "elasticsearch/users" data bag, or directly in the role/node configuration.

If you include the elasticsearch::search_discovery recipe, it will configure the cluster to use Chef search for discovering Elasticsearch nodes. This allows the cluster to operate without multicast, without AWS, and without having to manually manage nodes.

Usage

Chef Solo

You have to configure your node in a node.json file, upload the configuration file, this cookbook and any dependent cookbooks and all data bags, role, etc files to the server, and run chef-solo.

A basic node configuration can look like this:

echo '{
  "name": "elasticsearch-cookbook-test",
  "run_list": [
    "recipe[java]",
    "recipe[elasticsearch]"
  ],

  "java": {
    "install_flavor": "openjdk",
    "jdk_version": "7"
  },

  "elasticsearch": {
    "cluster" : { "name" : "elasticsearch_test_chef" },
    "bootstrap.mlockall" : false
  }
}
' > node.json

Let's upload it to our server (assuming Ubuntu on Amazon EC2):

scp -o User=ubuntu \
    -o IdentityFile=/path/to/your/key.pem \
    -o StrictHostKeyChecking=no \
    -o UserKnownHostsFile=/dev/null \
    node.json ec2-12-45-67-89.compute-1.amazonaws.com:

Let's download the cookbook on the target system:

ssh -i /path/to/your/key.pem ec2-12-45-67-89.compute-1.amazonaws.com \
  "curl -# -L -k -o cookbook-elasticsearch-master.tar.gz https://github.com/elasticsearch/cookbook-elasticsearch/archive/master.tar.gz"

Finally, let's install latest Chef, install dependent cookbooks, and run chef-solo:

ssh -t -i /path/to/your/key.pem ec2-12-45-67-89.compute-1.amazonaws.com <<END
  sudo apt-get update
  sudo apt-get install build-essential curl git vim -y
  curl -# -L http://www.opscode.com/chef/install.sh | sudo bash -s --
  sudo mkdir -p /etc/chef/; sudo mkdir -p /var/chef/cookbooks/elasticsearch
  sudo tar --strip 1 -C /var/chef/cookbooks/elasticsearch -xf cookbook-elasticsearch-master.tar.gz
  sudo apt-get install bison zlib1g-dev libopenssl-ruby1.9.1 libssl-dev libyaml-0-2 libxslt-dev libxml2-dev libreadline-gplv2-dev libncurses5-dev file ruby1.9.1-dev git --yes --fix-missing
  sudo /opt/chef/embedded/bin/gem install berkshelf --version 1.4.5 --no-rdoc --no-ri
  sudo /opt/chef/embedded/bin/berks install --path=/var/chef/cookbooks/ --berksfile=/var/chef/cookbooks/elasticsearch/Berksfile
  sudo chef-solo -N elasticsearch-test-chef-solo -j node.json
END

Verify the installation with:

ssh -i /path/to/your/key.pem ec2-12-45-67-89.compute-1.amazonaws.com "curl localhost:9200"

For a full and thorough walktrough, please read the tutorial on deploying elasticsearch with Chef Solo which uses this cookbook as an example.

This cookbook comes with a Rake task which allows to create, bootstrap and configure an Amazon EC2 with a single command. Save your node configuration into tmp/node.json file and run:

time \
 AWS_SSH_KEY_ID=your-key-id \
 AWS_ACCESS_KEY=your-access-keys \
 AWS_SECRET_ACCESS_KEY=your-secret-key\
 SSH_KEY=/path/to/your/key.pem \
 NAME=elasticsearch-test-chef-solo-with-rake \
 rake create

Run rake -T for more information about other available tasks, see the Rakefile for all available options and configurations.

Chef Server

For Chef Server based deployment, include the recipes you want to be executed in a dedicated elasticsearch role, or in the node run_list.

Then, upload the cookbook to the Chef server:

    knife cookbook upload elasticsearch

To enable the Amazon AWS related features, include the elasticsearch::aws recipe. You will need to configure the AWS credentials.

You may do that in the node configuration (with knife node edit MYNODE or in the Chef Server console), in a role with override_attributes declaration, but it is arguably most convenient to store the information in an "elasticsearch" data bag:

    mkdir -p ./data_bags/elasticsearch
    echo '{
      "id" : "aws",
      "_default" : {
        "discovery" : { "type": "ec2", "ec2" : { "groups": "elasticsearch" } },

        "cloud"   : {
          "aws"     : { "access_key": "YOUR ACCESS KEY", "secret_key": "YOUR SECRET ACCESS KEY" }
        }
      }
    }' > ./data_bags/elasticsearch/aws.json

Do not forget to upload the data bag to the Chef server:

    knife data bag from file elasticsearch aws.json

To use the EBS related features, use your preferred method of configuring node attributes, or store the configuration in a data bag called elasticsearch/data:

    {
      "elasticsearch": {
        // ...
        "data" : {
          "devices" : {
            "/dev/sda2" : {
              "file_system"      : "ext3",
              "mount_options"    : "rw,user",
              "mount_path"       : "/usr/local/var/data/elasticsearch/disk1",
              "format_command"   : "mkfs.ext3",
              "fs_check_command" : "dumpe2fs",
              "ebs"            : {
                "size"                  : 250,         // In GB
                "delete_on_termination" : true,
                "type"                  : "io1",
                "iops"                  : 2000
              }
            }
          }
        }
      }
    }

Customizing the cookbook

When you want to significantly customize the cookbook - changing the templates, adding a specific logic -, the best way is to use the "wrapper cookbook" pattern: creating a lightweight cookbook which will customize this one. Let's see how to change the template for the logging.yml file in this way.

First, we need to create our "wrapper" cookbook:

knife cookbook create my-elasticsearch --cookbook-path=. --verbose --yes

Next, we'll include the main cookbook in our default recipe:

cat <<-CONFIG >> ./cookbooks/my-elasticsearch/recipes/default.rb

include_recipe 'java'
include_recipe 'elasticsearch::default'
CONFIG

Then, we'll change the cookbook for the appropriate template resource:

cat <<-CONFIG >> ./cookbooks/my-elasticsearch/recipes/default.rb

logging_template = resources(:template => "logging.yml")
logging_template.cookbook "my-elasticsearch"
CONFIG

Of course, we may redefine the whole logging.yml template definition, or other parts of the cookbook.

Don't forget to put your custom template into the appropriate path:

cat <<-CONFIG >> ./cookbooks/my-elasticsearch/templates/default/logging.yml.erb
# My custom logging template...
CONFIG

We can configure a node with our custom cookbook, now:

echo '{
  "name": "elasticsearch-wrapper-cookbook-test",
  "run_list": [
    "recipe[my-elasticsearch]"
  ]
' > node.json

Upload your "wrapper" cookbook to the server, and run Chef on the node, eg. following the instructions for Chef Solo above:

scp -R ... cookbooks/my-elasticsearch ...
ssh ... "sudo mv --force --verbose /tmp/my-elasticsearch /var/chef/cookbooks/my-elasticsearch"
ssh ... <<END
....
END
ssh ... "sudo chef-solo -N elasticsearch-wrapper-cookbook-test -j node.json"

Nginx Proxy

Usually, you will restrict the access to Elasticsearch with firewall rules. However, it's convenient to be able to connect to the Elasticsearch cluster from curl or a HTTP client, or to use a management tool such as BigDesk or Paramedic. (Don't forget to set the node.elasticsearch[:nginx][:allow_cluster_api] attribute to true if you want to access these tools via the proxy.)

To enable authorized access to elasticsearch, you need to include the elasticsearch::proxy recipe, which will install, configure and run Nginx as a reverse proxy, allowing users with proper credentials to connect.

Usernames and passwords may be stored in a data bag elasticsearch/users:

    mkdir -p ./data_bags/elasticsearch
    echo '{
      "id" : "users",
      "_default" : {
        "users" : [
          {"username" : "USERNAME", "password" : "PASSWORD"},
          {"username" : "USERNAME", "password" : "PASSWORD"}
        ]
      }
    }
    ' > ./data_bags/elasticsearch/users.json

Again, do not forget to upload the data bag to the Chef server:

    knife data bag from file elasticsearch users.json

After you have configured the node and uploaded all the information to the Chef server, run chef-client on the node(s):

    knife ssh name:elasticsearch* 'sudo chef-client'

Please note that all data bags must have attributes enclosed in an environment (use the _default environment), as suggested by the Chef documentation.

Vagrant Integration

The cookbook comes with a Vagrantfile, which allows you to test-drive the installation and configuration with Vagrant, a tool for building virtualized infrastructures.

NOTE: Currently, the integration supports only "gem" variant of Vagrant, i.e. 1.0.x.

First, make sure, you have both VirtualBox and Vagrant installed.

Then, clone this repository into an elasticsearch directory on your development machine:

    git clone git://github.com/elasticsearch/cookbook-elasticsearch.git elasticsearch

Switch to the cloned repository:

    cd elasticsearch

Install the neccessary gems with Bundler:

    gem install bundler
    bundle install

All the required third-party cookbooks will be automatically installed via the Berkshelf integration. If you want to install them locally (eg. to inspect them), use the berks command:

    berks install --path ./tmp/cookbooks

The Vagrantfile supports four Linux distributions:

  • Ubuntu Precise 64 bit
  • Ubuntu Lucid 32 bit
  • Ubuntu Lucid 64 bit
  • CentOS 6 32 bit

Use the vagrant status command for more information.

We will use the Ubuntu Precise 64 box for the purpose of this demo. You may want to test-drive this cookbook on a different distribution; check out the available boxes at http://vagrantbox.es or build a custom one with veewee.

Launch the virtual machine (it will download the box unless you already have it):

    time CHEF=latest bundle exec vagrant up precise64

The machine will be started and automatically provisioned with chef-solo. (Note: You may substitute latest with a specific Chef version. Set the UPDATE environment variable to update packages on the machine as well.)

You'll see Chef debug messages flying by in your terminal, downloading, installing and configuring Java, Nginx, Elasticsearch, and all the other components. The process should take less then 10 minutes on a reasonable machine and internet connection.

After the process is done, you may connect to elasticsearch via the Nginx proxy from the outside:

    curl 'http://USERNAME:PASSWORD@33.33.33.10:8080/test_chef_cookbook/_search?pretty&q=*'

Of course, you should connect to the box with SSH and check things out:

    bundle exec vagrant ssh precise64

    ps aux | grep elasticsearch
    service elasticsearch status --verbose
    curl http://localhost:9200/_cluster/health?pretty
    sudo monit status elasticsearch

To change the system after the installation, you can just update node attributes and run the vagrant provision command. Instead of changing the default/attributes.rb file or the Vagrantfile, you can provide a separate JSON file with the node configuration.

For example, let's upgrade the Elasticsearch version. First, we have to create the node configuration file:

    echo '{
      "elasticsearch" : {
        "version" : "1.0.0.Beta2"
      }
    }
    ' > node.json

Now, pass the path to the configuration file to the vagrant provision command:

    time CONFIG=node.json bundle exec vagrant provision precise64

Verify that the Elasticsearch version has been upgraded to 1.0.0.Beta2, in fact:

    curl '33.33.33.10:9200?pretty'

Tutorial

You can follow a comprehensive tutorial, "Deploying Elasticsearch with Chef Solo", which walks through the process of installing a production-ready Elasticsearch system on Amazon EC2.

Cookbook Integration Tests

The cookbook provides test cases in the files/default/tests/minitest/ directory, which are executed as a part of the Chef run in Vagrant (via the Minitest Chef Handler support). They check the basic installation mechanics, populate the test_chef_cookbook index with some sample data, perform a simple search, etc.

To run the tests, set the TEST environment when running Vagrant:

    time CHEF=latest TEST=yes bundle exec vagrant up precise64

Repository

http://github.com/elasticsearch/cookbook-elasticsearch

License

Author: Karel Minarik (karmi@elasticsearch.com) and contributors

License: Apache