Skip to content

Install and configure Hadoop on KVM machines with Ansible, bootstrapped by Terraform on AWS.

Notifications You must be signed in to change notification settings

flavienbwk/aws-terraform-ansible-kvm-hadoop

Repository files navigation

aws-terraform-ansible-kvm-hadoop

Install and configure Hadoop on KVM machines with Ansible, bootstrapped by Terraform on AWS.

Here is an example of HDFS storage cluster running with this project.

DFS storage types tab Hadoop cluster live datanodes

A note about KVM in the Cloud

Basically, you can't work with KVM on a classic AWS EC2 instance unless you have a baremetal one. You must choose a *.metal instance type (:moneybag:).

This repo is for educational purposes. If you use Cloud providers, only use KVM if you absolutely NEED TO. It asks for a more costly infrastructure, time-consuming instanciations, adds a layer of complexity already managed by Cloud providers (network, machine configuration) and as such is a burden to maintain. This architecture is only useful if you have big machines that must include strictly partitioned VMs.

As Scaleway Elastic Metal machines are way less expensive than AWS, you will find Terraform plans and instructions for both AWS and Scaleway.

1. Instantiate the infrastructure

Architecture schema

In this architecture, we will setup a VPN server to get KVM guests to communicate. After setting up and connecting Hadoop nodes through the VPN network, a client will try to mount an HDFS space as a FUSE to be used as a file system.

The ResourceManager described here will get the roles of ResourceManager, NodeManager and MapReduce Job History server.

👉 Using AWS (price: 8545.09$/month)

⚠️ 💰 Please be very careful running the Terraform plans as prices for baremetal instances are very high. The indicated cost is about the least expensive instance found in the North Virginia region.

  1. Create AWS credentials and set them inside the ~/.aws/credentials file :

    # ~/.aws/credentials
    [default]
    aws_access_key_id = my-access-key
    aws_secret_access_key = my-secret-key
  2. Import your public key with name main in EC2's Key Pairs menu

  3. Make sure there's no error by running init and plan commands

    Make sure to have an SSH key which description is main in your Scaleway account.

    terraform -chdir=./plans/aws init
    terraform -chdir=./plans/aws plan
  4. Execute the plan

    This command will generate our global.ini inventory file :

    terraform -chdir=./plans/aws apply

    To terminate instances and avoid unintended spendings, use terraform destroy

👉 Using Scaleway (price: 303.37$/month)
  1. Go to your Scaleway account > IAM > API Keys and create a new API key terraform-ansible-kvm-hadoop

  2. Run the following export commands replacing values by yours

    export TF_VAR_SCW_PROJECT_ID="my-project-id" # Find this in console.scaleway.com/project/settings
    export TF_VAR_SCW_ACCESS_KEY="my-access-key"
    export TF_VAR_SCW_SECRET_KEY="my-secret-key"

    Tip : append these variables to your ~/.bashrc file

  3. Make sure there's no error by running init and plan commands

    Make sure to have an SSH key which description is main in your Scaleway account.

    terraform -chdir=./plans/scaleway init
    terraform -chdir=./plans/scaleway plan
  4. Execute the plan

    This command will generate our global.ini inventory file :

    terraform -chdir=./plans/scaleway apply

    To terminate instances and avoid unintended spendings, use terraform destroy

2. Setup the infrastructure and install services

Chaining of Ansible's playbook actions

  1. Install the OpenVPN server

    ansible-playbook -i inventories/global.ini ./playbooks/install.yml --extra-vars @./vars/all.yml -t vpn-server
  2. Install KVM on each host and create guests

    ansible-galaxy collection install community.libvirt
    ansible-playbook -i inventories/global.ini ./playbooks/install.yml --extra-vars @./vars/all.yml -t kvm-install
  3. Connect all machines to communicate with each other (OpenVPN clients)

    Connect and retrieve IP of each KVM guest :

    eval `ssh-agent` && ssh-add -D
    ansible-playbook -i inventories/global.ini ./playbooks/install.yml --extra-vars @./vars/all.yml -t vpn-client
  4. Install Hadoop cluster

    ansible-playbook -i inventories/global.ini ./playbooks/install.yml --extra-vars @./vars/all.yml -t hadoop
  5. Install HDFS FUSE client

    ansible-playbook -i inventories/global.ini ./playbooks/install.yml --extra-vars @./vars/all.yml -t hdfs-fuse-clients

Inspirations

Great repos that helped build this one :

About

Install and configure Hadoop on KVM machines with Ansible, bootstrapped by Terraform on AWS.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published