SIM-PIPE generates and simulates a deployment configuration for the final deployment that conforms to the hardware requirements and includes any additional necessary middleware inter-step communication code. Finally, the tool provides a pipeline testing functionality, including a sandbox for evaluating individual pipeline step performance, and a simulator to determine the performance of the overall Big Data pipeline. Specifically, SIM-PIPE provides the following high-level features:
- Deploying each step of a pipeline and running it in a sandbox by providing sample input
- Evaluating pipeline step performance by recording and analysing metrics about its execution in order to identify bottlenecks and steps to be optimized
- Identification of resource requirements for pipeline by calculating step performance per resource used
If you use MacOS or Debian based Linux, run the following command to install and start SIM-PIPE:
python install.py
python start.py
# or
python3 install.py
python3 start.py
Please note that this is an opiniated installation script. You may want to install it manually instead.
Use the following command to easily expose the various services of SIM-PIPE:
python forwarding.py
You can check the advanced installation section for more details on the installation process.
After starting SIM-PIPE and while running the python forwarding.py
script, browse to http://localhost:8088/ to access the SIM-PIPE GUI.
Build the hello-world software container image locally:
# Example using Docker
docker buildx build -t hello-world examples/hello-world
# or (if the previous command fails)
docker-buildx build -t hello-world examples/hello-world
Run the hello-world pipeline:
argo submit --watch examples/hello-world.yaml
Check the logs of the hello-world pipeline:
argo logs @latest
The MacOS installation is automated using brew
and the python install.py
script. You need to install brew first.
Note that the python start.py
script will automatically install the dependencies first.
The MacOS installation uses a Linux virtual machine using colima
named simpipe
. When starting simpipe, the default Kubernetes context will be set to the simpipe
kubernetes cluster.
The Linux installation is also automated using the python install.py
script. We only focus on Debian based Linux distributions for now. We tested on Debian and Ubuntu, but it may work with little efforts on other distributions with little modifications.
The installation will first install Ansible and then Ansible to install everything.
If you don't with to use the Python installation script, you can also use the Ansible playbooks directly.
sudo ansible-galaxy install -r ./ansible/requirements.yaml
echo sudo ansible-playbook -i localhost, -c local -e docker_users=[\'$(whoami)\']./ansible/install-everything.yaml
If you are already running a Kubernetes cluster on your machine, it may be easier to install SIM-PIPE on it directly using Helm as explained in the following section.
You can install SIM-PIPE on any Kubernetes cluster using the Helm chart in the charts/simpipe
folder or a released helm chart using the oci registry at oci://ghcr.io/datacloud-project/sim-pipe
.
Please note that it is recommended to use a clean Kubernetes cluster for the installation.
SIM-PIPE is been developed and tested on kubernetes 1.27
with the K3S distribution. The default configuration
uses the default
namespace and has opiniated settings for Argo Workflow and the various secrets and role bindings.
You may want to change the configuration of the Helm chart to match your needs.
# Using the latest release
helm install simpipe oci://ghcr.io/datacloud-project/sim-pipe
# or using the local folder
helm install simpipe ./charts/simpipe
SIM-PIPE runs everywhere as long as it runs Linux. If you are using Windows, you can install SIM-PIPE using the Windows Subsystem for Linux (WSL) in its second version (WSL2). Then you can select a Debian based Linux distribution and proceed as normal.
You may have to run the following instructions to make Docker work: microsoft/WSL#6655 (comment) The installation script attempts to fix it for you.
Please consult the ARCHITECTURE.md
document for more details on the SIM-PIPE architecture.
SIM-PIPE is designed to only allow trusted users to deploy pipelines.
DO NOT expose the SIM-PIPE API to the public Internet without authorising and authentifying your users.
The default installation of SIM-PIPE IS NOT secure. You need to configure the authentication and authorisation mechanisms yourself.
In practice, SIM-PIPE is better to run on your local machine. When port forwarding, make sure you do not expose the SIM-PIPE API to an untrusted network. The defaults are set to localhost only.
Before raising a pull request, please read our contributing guide.
SIM-PIPE is released as open source software under the Apache License 2.0.