Build a custom data-catalog in minutes
- CatalogBuilder is a simple tool to generate & deploy a documentation website for your data assets.
- It enables anyone at your company to quickly find the trusted data they are looking for.
There are many open-source projects (admundsen, open-metadata, datahub, metacat, atlas) to build such a catalog in-house. But as they offer a lot of advanced features, they are hard to manage and deploy if you're not a tech expert. They can be even harder to customize.
dbt docs is great to generate a documentation website on top of your dbt assets but:
- it focuses on dbt only (while you are interested in other sources + metadata)
- is very hard to customize (except you're an angular expert)
- can be slow.
👉 CatalogBuilder aims at offering a lightweight alternative to generate a documentation website on top of your data assets. It focuses on read-only data discovery and:
- ✔️ can be easily customized and deployed by low tech people
- ✔️ can then handle the very specific needs of your company
- ✔️ is fast and lightweight
- ✔️ is built on top of the very famous mkdocs-material python library which is used by millions of developers to deploy their documentation (such as fastapi).
catalogis the CLI (command-line-interface) of CatalogBuilder to generate, show & deploy the documentation.
pip install catalog-buildercatalog download dbt_gitlab_data_teamTo get started, let's download a catalog configuration example from the GitHub repo and play with it. The above command will download the catalogs/dbt_gitlab_data_team folder on your laptop.
You will find in the folder:
assets file: a file containing the list of the assets you want to put in your documentation. It can be a parquet file namedassets.parquetor a json lines file namedassets.jsonl. Each asset in the file must have the following fields:
asset_type: for example:table.documentation_path: the path of the asset page in the generated documentation. For exampledataset_name/table_name.data: a dict of attributes used to generate the documentation. For example{"name": "foo"}generate_assets_file.py: the python script used to (re)generate theassets file.requirements.txt: the python requirements needed bygenerate_assets_file.py.templates: a folder which includes a jinja-template markdown-file for eachasset_type. These templates are used to generate a markdown documentation file for each asset.source_docs: a folder which includes files to include as-is in the documentation.mkdocs.yml: the mkdocs configuration file used by mkdocs to build the documentation website from the generated markdown files.
catalog build dbt_gitlab_data_team
- For each asset of the
assets file, the jinja template ofasset_typewill be rendered using the assetdatato generate a markdown file which will be written intocatalogs/dbt_gitlab_data_team/docs/atdocumentation_path.- All files in
catalogs/dbt_gitlab_data_team/source_docs/are copied intocatalogs/dbt_gitlab_data_team/docs/- Mkdocs will then build the documentation website from the markdown files into
catalogs/dbt_gitlab_data_team/site(usingmkdocs.ymlconfiguration file).
catalog serve dbt_gitlab_data_teamYou can now see the generated documentation website at http://localhost:8000.
A. To deploy on GitHub pages:
catalog deploy github-pages dbt_gitlab_data_teamMkdocs will deploy the site on GitHub pages (this only works if you are on a github repository).
B. To deploy on Google Cloud Storage Bucket:
catalog deploy gcs dbt_gitlab_data_teamMkdocs will copy all the files in
catalogs/dbt_gitlab_data_team/siteto the bucket defined bysite_urlvalue ofcatalogs/dbt_gitlab_data_team/mkdocs.yml. For instance if the site url ishttp://catalogs.unytics.io/dbt_gitlab_data_team/it will copy all files undercatalogs/dbt_gitlab_data_team/sitetogs://catalogs.unytics.io/dbt_gitlab_data_team/
C. To deploy elsewhere:
You can follow these instructions from mkdocs.
To generate a documentation website for your own dbt project, do the following:
- Change directory to your dbt project directory
- Download
catalogs/dbtdocumentation example by runningcatalog download dbt. - Run
dbt docs generateto computetarget/manifest.jsonandtarget/catalog.json. - Generate the assets file by running
python catalogs/dbt/generate_assets_file.py. The script will parsetarget/manifest.jsonandtarget/catalog.jsonto generate theassets filein the expected format. - Run
catalog serve dbtto build the website and show it locally.
Join our Slack for any question, to get help for getting started, to speak about a bug, to suggest improvements, or simply if you want to have a chat 🙂.
Any contribution is more than welcome 🤗!
- Add a ⭐ on the repo to show your support
- Join our Slack and talk with us
- Raise an issue to raise a bug or suggest improvements
- Open a PR!
