Skip to content

Commit

Permalink
feat: remove superset container and update Readme (#64)
Browse files Browse the repository at this point in the history
* feat: remove superset container and update Readme

* Update README.md

Co-authored-by: Phil Mwago <41321750+Phil-Mwago@users.noreply.github.com>

---------

Co-authored-by: Phil Mwago <41321750+Phil-Mwago@users.noreply.github.com>
  • Loading branch information
njuguna-n and Phil-Mwago authored Jan 19, 2024
1 parent cfd5d3d commit 8acbc93
Show file tree
Hide file tree
Showing 7 changed files with 8 additions and 67 deletions.
2 changes: 0 additions & 2 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,6 @@ jobs:
COUCHDB_HOST: "couchdb"
COUCHDB_PORT: 5984
COUCHDB_SECURE: false
SUPERSET_PASSWORD: "password"
SUPERSET_ADMIN_EMAIL: "user@superset.com"
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v2
Expand Down
21 changes: 7 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# CHT Sync

CHT Sync is a bundled solution consisting of [Logstash](https://www.elastic.co/logstash/), [CouchDB](https://couchdb.apache.org/), [PostgREST](https://postgrest.org/en/stable/), [DBT](https://www.getdbt.com/), and [Superset](https://superset.apache.org/). Its purpose is to synchronize data from CouchDB to PostgreSQL, facilitating analytics on a Superset dashboard. This synchronization occurs in real-time, ensuring that the data displayed on the dashboard is always up-to-date. CHT Sync copies data from CouchDB to PostgreSQL, enabling seamless integration and timely analytics.
CHT Sync is a bundled solution consisting of [Logstash](https://www.elastic.co/logstash/), [CouchDB](https://couchdb.apache.org/), [PostgREST](https://postgrest.org/en/stable/), and [DBT](https://www.getdbt.com/). Its purpose is to synchronize data from CouchDB to PostgreSQL, facilitating analytics on a dashboard. This synchronization occurs in real-time, ensuring that the data displayed on the dashboard is always up-to-date. CHT Sync copies data from CouchDB to PostgreSQL, enabling seamless integration and timely analytics.

**WARNING!** The schema differs from couch2pg. See [`./postgres/init-dbt-resources.sh`](./postgres/init-dbt-resources.sh).

Expand All @@ -16,10 +16,8 @@ At the core of the CHT Sync are Logstash, PostgREST, and DBT. Logstash plays a k

Once the data is synchronized and stored in PostgreSQL, it undergoes transformation using predefined DBT models from the [cht-pipeline](https://github.com/medic/cht-pipeline). DBT plays a crucial role in preparing the data in a format that is optimized for querying and analysis, ensuring the data is readily available for analytics purposes.

CHT Sync also leverages Superset, an analytics and dashboarding platform, to provide intuitive visualizations and interactive analytics on the synchronized data stored in PostgreSQL. Superset empowers users to explore and gain valuable insights from the data, enabling informed decision-making and data-driven actions.

The overall architecture of CHT-sync is driven by the seamless integration of these technologies. CouchDB serves as the source database, containing the original data to be synchronized. Logstash, PostgREST, and DBT facilitate the data flow from CouchDB to PostgreSQL, transforming it into a queriable format. PostgreSQL acts as the centralized repository for the synchronized and transformed data, while Superset provides the interface for users to explore and visualize the analytics.

The overall architecture of CHT-sync is driven by the seamless integration of these technologies. CouchDB serves as the source database, containing the original data to be synchronized. Logstash, PostgREST, and DBT facilitate the data flow from CouchDB to PostgreSQL, transforming it into a queriable format. PostgreSQL acts as the centralized repository for the synchronized and transformed data.
We suggest using Superset for creating your dashboards, data visualization, or infographics.
## Getting Started

CHT Sync has been specifically designed to work in both local development environments for testing models or workflows, gamma environment, as well as in production environments. Each setup accommodates the needs of different stages or environments.
Expand All @@ -34,7 +32,6 @@ There are four environment variable groups in the `.env.template` file. To succe
1. Postgresql and Postgres: Are used to establish the Postgres database to synchronize CouchDB data. They also define the schema and table names to store the CouchDB data. The main objective is to define the environment where the raw CouchDB data will be copied.
2. DBT: These environment variables are exclusive to the DBT configuration. The main objective is to define the environment where the tables and views for the models defined in `CHT_PIPELINE_BRANCH_URL` will be created. It is important to separate this environment from the previous group. `DBT_POSTGRES_USER` and `DBT_POSTGRES_SCHEMA` must be different from `POSTGRES_USER` and `POSTGRES_SCHEMA`. `DBT_POSTGRES_HOST` has to be the Postgres instance created with the environment variables set in the first group.
3. The following environment variables define the CouchDB instance we want to sync with. With `COUCHDB_DBS`, we can specify a list of databases to sync.
4. Superset: These environment variables are exclusive to the Superset configuration.

### Local Setup

Expand All @@ -49,7 +46,7 @@ COUCHDB_DBS=<dbs-to-sync> # space separated list of databases you want to sync e
2. Install the dependencies and run the Docker containers locally:

```sh
# starts: logstash, superset, postgres, postgrest, data-generator, couchdb and dbt
# starts: logstash, postgres, postgrest, data-generator, couchdb and dbt
npm install
npm run local
```
Expand Down Expand Up @@ -82,18 +79,14 @@ COUCHDB_DBS=<dbs-to-sync> # space separated list of databases you want to sync e
COUCHDB_HOST=<your-couchdb-host>
COUCHDB_PORT=<your-couchdb-port>
COUCHDB_SECURE=false
# superset: required environment variables for 'gamma', 'prod' and 'local'
SUPERSET_PASSWORD=<your-superset-password>
SUPERSET_ADMIN_EMAIL=<your-superset-emaild>
```

If `CHT_PIPELINE_BRANCH_URL` is pointing to a private repo then you need to provide an access token in the url i.e. `https://<PAT>@github.com/medic/cht-pipeline.git#main`. In this example you will replace `<PAT>` with an access token from Github. Instruction on how to generate one can be found [here](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens).

2. Install the dependencies and run the Docker containers locally:

```sh
# starts: logstash, superset, postgres, postgrest, data-generator, couchdb and dbt
# starts: logstash, postgres, postgrest, data-generator, couchdb and dbt
npm install
npm run local
```
Expand Down Expand Up @@ -125,7 +118,7 @@ COUCHDB_SECURE=false

2. Install the dependencies and start the Docker containers:
```sh
# starts: logstash, superset, postgres, postgrest, and dbt
# starts: logstash, postgres, postgrest, and dbt
npm install
npm run gamma
```
Expand Down Expand Up @@ -169,7 +162,7 @@ docker-compose -f docker-compose.postgres.yml -f docker-compose.yml up postgres

3. Install the dependencies and start the Docker containers:
```sh
# starts: logstash, superset, postgrest and dbt
# starts: logstash, postgrest and dbt
npm install
npm run prod
```
9 changes: 0 additions & 9 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,6 @@ services:
- COUCHDB_SECURE=${COUCHDB_SECURE:-true}
- HTTP_ENDPOINT=postgrest:3000

superset:
build:
context: ./superset/
args:
SUPERSET_PASSWORD: ${SUPERSET_PASSWORD:-password}
SUPERSET_ADMIN_EMAIL: ${SUPERSET_ADMIN_EMAIL:-user@superset.com}
ports:
- 8080:8088

dbt:
platform: linux/amd64
image: medicmobile/dataemon:latest
Expand Down
4 changes: 0 additions & 4 deletions env.template
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,3 @@ COUCHDB_DBS="couchdb couchdb_sentinel" # space separated list of databases you w
COUCHDB_HOST=couchdb
COUCHDB_PORT=5984
COUCHDB_SECURE=false

# superset: required environment variables for 'gamma', 'prod' and 'local'
SUPERSET_PASSWORD=password
SUPERSET_ADMIN_EMAIL=user@superset.com
5 changes: 0 additions & 5 deletions scripts/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,6 @@ export const POSTGRES = {
schema: process.env.POSTGRES_SCHEMA
};

export const SUPERSET = {
username: process.env.SUPERSET_ADMIN_EMAIL || 'user@superset.com',
password: process.env.SUPERSET_PASSWORD || 'password',
};

export const DBT_POSTGRES = {
schema: process.env.DBT_POSTGRES_SCHEMA || 'dbt'
}
Expand Down
20 changes: 0 additions & 20 deletions superset/Dockerfile

This file was deleted.

14 changes: 1 addition & 13 deletions tests/e2e-test.spec.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import { Client } from "ts-postgres";
import { rootConnect } from "./postgres-utils";
import request from 'supertest';
import { POSTGRES, SUPERSET, DBT_POSTGRES } from "../scripts/config";
import { POSTGRES, DBT_POSTGRES } from "../scripts/config";

describe("Main workflow Test Suite", () => {
let client: Client;
Expand All @@ -20,16 +20,4 @@ describe("Main workflow Test Suite", () => {
let personTableResult = await client.query("SELECT * FROM " + DBT_POSTGRES.schema + ".person");
expect(personTableResult.rows.length).toBeGreaterThan(0);
});

it("should be able to login to superset dashboard", async () => {
const supersetDashboardResponse = await request('http://localhost:8080')
.post('/api/v1/security/login')
.send({
password: SUPERSET.password,
provider: "db",
refresh: true,
username: SUPERSET.username
});
expect(supersetDashboardResponse.status).toBe(200);
});
});

0 comments on commit 8acbc93

Please sign in to comment.