Skip to content

Latest commit

 

History

History
48 lines (31 loc) · 2.06 KB

README.md

File metadata and controls

48 lines (31 loc) · 2.06 KB

Trino deployment on Kubernetes - ECE Paris Big Data Course Project

About

This repository offers everything required to deploy a distributed data processing environment on Kubernetes over 4 virtual machines.

It has been developed in the context of a project in the Big Data Ecosystem course of ECE Students Luka, Cléa & Mathias.

It aims to deploy Trino in Kubernetes, while relying on object storage with MinIO.

Usage

  1. Adjust the machine's resources in the Vagrantfile to match your own host computer's limitations. (A worker node ideally needs 6 GB of RAM, and MinIO requires at least 4 hard disks to function)
  2. Clone this repository
  3. Install Vagrant, VirtualBox, Ansible on your host machine
  4. Run vagrant up in the root of the repository

Architecture we used

We used a Virtual Machine hosted in a friend's server, running Ubuntu 22.04. The specs are as follows:

  • CPU: 16 cores
  • RAM: 94 GB
  • Storage: 256 GB

We have previously attempted to run this project in a Linux Container (LXC) with the same specs, but the latter supported neither VirtualBox nor KVM/QEMU. We thus switched to a traditional VM, which supports VirtualBox.

Project report

We have created a dedicated projet report for our evaluation at ECE.

Documentation index

We have documented all of our attempts for each part of the project, which we have divided and documented as follows:

  1. Creating & provisioning VMs with Vagrant
  2. Deploying Kubernetes with kubeadm, kubelet, kubectl
  3. Deploying basic Kubernetes services (CNI)
  4. Deploying MinIO to provide object storage
  5. Deploying Trino to perform distributed computation

Contributors

Name Email
Luka BIGOT luka.bigot@edu.ece.fr
Cléa DEDUIT clea.deduit@edu.ece.fr
Mathias SERICOLA mathias.sericola@edu.ece.fr