This installation process is meant for simple on premise deployments.
We currently support Ubuntu Systems 20.04, 22.04 and 24.04 LTS. Compatibility with other linux distributions should work fine but was not tested.
Here is what you will need:
-
You need an account with the ability to run sudo on the target machine
-
Access and Credentials to our registry here: Gitlab registry.
-
A valid KAWA license.
RAM
For small amounts of data (up to ~200 GB compressed), it is best to use as much memory as the volume of data. For large amounts of data and when processing interactive (online) queries, you should use a reasonable amount of RAM (128 GB or more) so the hot data subset will fit in the cache of pages. Even for data volumes of ~50 TB per server, using 128 GB of RAM significantly improves query performance compared to 64 GB.
CPU
KAWA will use all available CPU to maximize performance. So the more CPU - the better. For processing up to hundreds of millions / billions of rows, the recommended number of CPUs is at least 64-cores. We only support AMD64 architecture.
Storage Subsystem
SSD is preferred. HDD is the second best option, SATA HDDs 7200 RPM will do. The capacity of the storage subsystem directly depends on the target analytics perimeter.
The installation procedure will install all the KAWA components:
- A postgres database
- A clickhouse data warehouse
- The KAWA server
- The KAWA script runner
All these components can be installed separately if you wish.
- Clone this repository on the target machine
git clone https://github.com/kawa-analytics/kawa-install.git
cd kawa-install- Input your token:
echo 'gldt-*******' > configuration/deploy.token- Run the installation script as root:
sudo ./install.sh⚡ Important: During the installation process, you will be prompted for the password for the system user on clickhouse. Keep it safe, it will be necessary further down the installation.
Connect to the web server from a web browser to test the installation:
By default, KAWA will listen on port 8080.
The default credentials are:
login: setup-admin@kawa.io
password: changeme
The initial configuration can be done following the documentation hosted here: KYWY doc github.
Follow the README and then: Initial setup Notebook
Please refer to the full documentation here: https://github.com/kawa-analytics/kawa-docker-install
The KAWA Server and the KAWA Python runner are both started with the kawa-system user.
Both are started as systemd services.
sudo systemctl status kawa
sudo systemctl status kawa-python-runnerYou can use stop, start and restart to control the services.
The log files can be found here: /var/log/kawa.
- The server is generating the
kawa-standalone.logfile. - The python runner:
kawapythonserver.log.
They are located in the /etc/kawa directory.
The main parameters are located in the kawa.env file.
The /var/lib/kawa will contain user data such as scripts and uploaded csvs.
Please make sure that it contains enough space.
KAWA is compatible with the following data warehouses/data lakes:
- Clickhouse
- Snowflake
- Trino
- Big Query
- Starrocks
In order to configure them, please refer to the kawa.env files, which contains more details.
Please contact support@kawa.ai for assistance regarding this configuration.
