Install and configure Hadoop on KVM machines with Ansible, bootstrapped by Terraform on AWS.
Here is an example of HDFS storage cluster running with this project.
Basically, you can't work with KVM on a classic AWS EC2 instance unless you have a baremetal one. You must choose a *.metal
instance type (:moneybag:).
This repo is for educational purposes. If you use Cloud providers, only use KVM if you absolutely NEED TO. It asks for a more costly infrastructure, time-consuming instanciations, adds a layer of complexity already managed by Cloud providers (network, machine configuration) and as such is a burden to maintain. This architecture is only useful if you have big machines that must include strictly partitioned VMs.
As Scaleway Elastic Metal machines are way less expensive than AWS, you will find Terraform plans and instructions for both AWS and Scaleway.
In this architecture, we will setup a VPN server to get KVM guests to communicate. After setting up and connecting Hadoop nodes through the VPN network, a client will try to mount an HDFS space as a FUSE to be used as a file system.
The ResourceManager described here will get the roles of ResourceManager, NodeManager and MapReduce Job History server.
👉 Using AWS (price: 8545.09$/month)
-
Create AWS credentials and set them inside the
~/.aws/credentials
file :# ~/.aws/credentials [default] aws_access_key_id = my-access-key aws_secret_access_key = my-secret-key
-
Import your public key with name
main
in EC2's Key Pairs menu -
Make sure there's no error by running
init
andplan
commandsMake sure to have an SSH key which description is
main
in your Scaleway account.terraform -chdir=./plans/aws init terraform -chdir=./plans/aws plan
-
Execute the plan
This command will generate our
global.ini
inventory file :terraform -chdir=./plans/aws apply
To terminate instances and avoid unintended spendings, use
terraform destroy
👉 Using Scaleway (price: 303.37$/month)
-
Go to your Scaleway account > IAM > API Keys and create a new API key
terraform-ansible-kvm-hadoop
-
Run the following
export
commands replacing values by yoursexport TF_VAR_SCW_PROJECT_ID="my-project-id" # Find this in console.scaleway.com/project/settings export TF_VAR_SCW_ACCESS_KEY="my-access-key" export TF_VAR_SCW_SECRET_KEY="my-secret-key"
Tip : append these variables to your
~/.bashrc
file -
Make sure there's no error by running
init
andplan
commandsMake sure to have an SSH key which description is
main
in your Scaleway account.terraform -chdir=./plans/scaleway init terraform -chdir=./plans/scaleway plan
-
Execute the plan
This command will generate our
global.ini
inventory file :terraform -chdir=./plans/scaleway apply
To terminate instances and avoid unintended spendings, use
terraform destroy
-
Install the OpenVPN server
ansible-playbook -i inventories/global.ini ./playbooks/install.yml --extra-vars @./vars/all.yml -t vpn-server
-
Install KVM on each host and create guests
ansible-galaxy collection install community.libvirt ansible-playbook -i inventories/global.ini ./playbooks/install.yml --extra-vars @./vars/all.yml -t kvm-install
-
Connect all machines to communicate with each other (OpenVPN clients)
Connect and retrieve IP of each KVM guest :
eval `ssh-agent` && ssh-add -D ansible-playbook -i inventories/global.ini ./playbooks/install.yml --extra-vars @./vars/all.yml -t vpn-client
-
Install Hadoop cluster
ansible-playbook -i inventories/global.ini ./playbooks/install.yml --extra-vars @./vars/all.yml -t hadoop
-
Install HDFS FUSE client
ansible-playbook -i inventories/global.ini ./playbooks/install.yml --extra-vars @./vars/all.yml -t hdfs-fuse-clients
Great repos that helped build this one :
- OpenVPN : robertdebock/ansible-role-openvpn
- Hadoop : andiveloper/ansible-hadoop
- KVM : noahbailey/ansible-qemu-kvm