Skip to content

Commit

Permalink
THREDDS: add more options to configure catalog.xml (#472)
Browse files Browse the repository at this point in the history
## Overview

- Currently the default THREDDS configuration creates two default
datasets, the Service Data dataset and the
Main dataset. The Service Data dataset is used internally and hosts WPS
outputs. The Main dataset is the
place where users can access data served by THREDDS. Both of these are
configured to serve files with the following
  extensions: .nc .ncml .txt .md .rst .csv

- In order to allow the THREDDS server to serve files with additional
extensions, this introduces two new
  variables: 
- `THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS`: this allows users to
specify additional [filter

elements](https://docs.unidata.ucar.edu/tds/current/userguide/tds_dataset_scan_ref.html#including-only-desired-files)
to the Service Data dataset. This is especially useful if a WPS
outputs files with an extension other than the default (eg: .h5) to the
`wps_outputs/` directory.
- `THREDDS_DATASET_DATASETSCAN_BODY`: this allows users to specify the
whole body of the main dataset's

[`<datasetScan>`](https://docs.unidata.ucar.edu/tds/current/userguide/tds_dataset_scan_ref.html)
element.
     This allows users to fully customize how this dataset serves files.

- We limit the configuration options for the Service Data dataset more
than the main dataset because the Service
Data dataset requires a basic configuration in order to properly serve
WPS outputs. Making significant changes
to this configuration could have unexpected negative impacts on WPS
usage.

- The defaults for these new variables are fully backwards compatible.
Without changing these variables, the THREDDS
  server should behave exactly the same as before. 

## Changes

**Non-breaking changes**
- Adds more configuration options to THREDDS catalog configuration
(backwards compatible)

**Breaking changes**
- None


## Related Issue / Discussion


## Additional Information

## CI Operations

<!--
The test suite can be run using a different DACCS config with
``birdhouse_daccs_configs_branch: branch_name`` in the PR description.
To globally skip the test suite regardless of the commit message use
``birdhouse_skip_ci`` set to ``true`` in the PR description.

Using ``[<cmd>]`` (with the brackets) where ``<cmd> = skip ci`` in the
commit message will override ``birdhouse_skip_ci`` from the PR
description.
Such commit command can be used to override the PR description behavior
for a specific commit update.
However, a commit message cannot 'force run' a PR which the description
turns off the CI.
To run the CI, the PR should instead be updated with a ``true`` value,
and a running message can be posted in following PR comments to trigger
tests once again.
-->

birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false
  • Loading branch information
mishaschwartz authored Oct 31, 2024
2 parents 3d7c8d6 + 15f1264 commit 11a589f
Show file tree
Hide file tree
Showing 11 changed files with 170 additions and 62 deletions.
6 changes: 3 additions & 3 deletions .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 2.5.3
current_version = 2.5.4
commit = True
tag = False
tag_name = {new_version}
Expand Down Expand Up @@ -30,11 +30,11 @@ search = {current_version}
replace = {new_version}

[bumpversion:file:RELEASE.txt]
search = {current_version} 2024-09-11T22:57:09Z
search = {current_version} 2024-10-31T15:06:05Z
replace = {new_version} {utcnow:%Y-%m-%dT%H:%M:%SZ}

[bumpversion:part:releaseTime]
values = 2024-09-11T22:57:09Z
values = 2024-10-31T15:06:05Z

[bumpversion:file(version):birdhouse/components/canarie-api/docker_configuration.py.template]
search = 'version': '{current_version}'
Expand Down
35 changes: 35 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,41 @@

[//]: # (list changes here, using '-' for each new entry, remove this when items are added)

[2.5.4](https://github.com/bird-house/birdhouse-deploy/tree/2.5.4) (2024-10-31)
------------------------------------------------------------------------------------------------------------------

## Changes

- THREDDS: add more options to configure `catalog.xml`
- The default THREDDS configuration creates two default datasets, the *Service Data* dataset and the
*Main* dataset. The *Service Data* dataset is used internally and hosts WPS outputs. The *Main* dataset is the
place where users can access data served by THREDDS. Both of these are configured to serve files with the following
extensions: .nc .ncml .txt .md .rst .csv

- In order to allow the THREDDS server to serve files with additional extensions, this introduces two new
variables:
- `THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS`: this allows users to specify additional [filter
elements](https://docs.unidata.ucar.edu/tds/current/userguide/tds_dataset_scan_ref.html#including-only-desired-files) to the *Service Data* dataset. This is especially useful if a WPS
outputs files with an extension other than the default (eg: .h5) to the `wps_outputs/` directory.
- `THREDDS_DATASET_DATASETSCAN_BODY`: this allows users to specify the whole body of the *Main* dataset's
[`<datasetScan>`](https://docs.unidata.ucar.edu/tds/current/userguide/tds_dataset_scan_ref.html) element.
This allows users to fully customize how this dataset serves files.

- We limit the configuration options for the *Service Data* dataset more than the *Main* dataset because the *Service
Data* dataset requires a basic configuration in order to properly serve WPS outputs. Making significant changes
to this configuration could have unexpected negative impacts on WPS usage.

- In order to allow customization of the Magpie THREDDS configuration in case new file extensions are added we introduce
two additional variables:
- `THREDDS_MAGPIE_EXTRA_METADATA_PREFIXES`: additional file prefixes (ie. regular expression match patterns) that Magpie
should treat as metadata (accessible with "browse" permissions).
- `THREDDS_MAGPIE_EXTRA_DATA_PREFIXES`: additional file prefixes (ie. regular expression match patterns) that Magpie
should treat as data (accessible with "read" permissions).

- The defaults for these new variables are fully backwards compatible. Without changing these variables, the THREDDS
server should behave exactly the same as before except that .md files and .rst files are now considered metadata
files according to the Magpie configuration, meaning that they can now be viewed with "browse" permissions.

[2.5.3](https://github.com/bird-house/birdhouse-deploy/tree/2.5.3) (2024-09-11)
------------------------------------------------------------------------------------------------------------------

Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Generic variables
override SHELL := bash
override APP_NAME := birdhouse-deploy
override APP_VERSION := 2.5.3
override APP_VERSION := 2.5.4

# utility to remove comments after value of an option variable
override clean_opt = $(shell echo "$(1)" | $(_SED) -r -e "s/[ '$'\t'']+$$//g")
Expand Down
8 changes: 4 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,13 @@ for a full-fledged production platform.
* - citation
- | |citation|

.. |commits-since| image:: https://img.shields.io/github/commits-since/bird-house/birdhouse-deploy/2.5.3.svg
.. |commits-since| image:: https://img.shields.io/github/commits-since/bird-house/birdhouse-deploy/2.5.4.svg
:alt: Commits since latest release
:target: https://github.com/bird-house/birdhouse-deploy/compare/2.5.3...master
:target: https://github.com/bird-house/birdhouse-deploy/compare/2.5.4...master

.. |latest-version| image:: https://img.shields.io/badge/tag-2.5.3-blue.svg?style=flat
.. |latest-version| image:: https://img.shields.io/badge/tag-2.5.4-blue.svg?style=flat
:alt: Latest Tag
:target: https://github.com/bird-house/birdhouse-deploy/tree/2.5.3
:target: https://github.com/bird-house/birdhouse-deploy/tree/2.5.4

.. |readthedocs| image:: https://readthedocs.org/projects/birdhouse-deploy/badge/?version=latest
:alt: ReadTheDocs Build Status (latest version)
Expand Down
2 changes: 1 addition & 1 deletion RELEASE.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.5.3 2024-09-11T22:57:09Z
2.5.4 2024-10-31T15:06:05Z
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,8 @@ SERVICES = {
# NOTE:
# Below version and release time auto-managed by 'make VERSION=x.y.z bump'.
# Do NOT modify it manually. See 'Tagging policy' in 'birdhouse/README.rst'.
'version': '2.5.3',
'releaseTime': '2024-09-11T22:57:09Z',
'version': '2.5.4',
'releaseTime': '2024-10-31T15:06:05Z',
'institution': '${BIRDHOUSE_INSTITUTION}',
'researchSubject': '${BIRDHOUSE_SUBJECT}',
'supportEmail': '${BIRDHOUSE_SUPPORT_EMAIL}',
Expand Down Expand Up @@ -141,8 +141,8 @@ PLATFORMS = {
# NOTE:
# Below version and release time auto-managed by 'make VERSION=x.y.z bump'.
# Do NOT modify it manually. See 'Tagging policy' in 'birdhouse/README.rst'.
'version': '2.5.3',
'releaseTime': '2024-09-11T22:57:09Z',
'version': '2.5.4',
'releaseTime': '2024-10-31T15:06:05Z',
'institution': '${BIRDHOUSE_INSTITUTION}',
'researchSubject': '${BIRDHOUSE_SUBJECT}',
'supportEmail': '${BIRDHOUSE_SUPPORT_EMAIL}',
Expand Down
30 changes: 10 additions & 20 deletions birdhouse/components/thredds/catalog.xml.template
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@
xmlns:xlink="http://www.w3.org/1999/xlink" >

<service name="all" serviceType="Compound" base="" >
<service name="http" serviceType="HTTPServer" base="/twitcher/ows/proxy/thredds/fileServer/" />
<service name="odap" serviceType="OpenDAP" base="/twitcher/ows/proxy/thredds/dodsC/" />
<service name="ncml" serviceType="NCML" base="/twitcher/ows/proxy/thredds/ncml/"/>
<service name="uddc" serviceType="UDDC" base="/twitcher/ows/proxy/thredds/uddc/"/>
<service name="iso" serviceType="ISO" base="/twitcher/ows/proxy/thredds/iso/"/>
<service name="wcs" serviceType="WCS" base="/twitcher/ows/proxy/thredds/wcs/" />
<service name="wms" serviceType="WMS" base="/twitcher/ows/proxy/thredds/wms/" />
<service name="subsetServer" serviceType="NetcdfSubset" base="/twitcher/ows/proxy/thredds/ncss/" />
<service name="http" serviceType="HTTPServer" base="${TWITCHER_PROTECTED_PATH}/thredds/fileServer/" />
<service name="odap" serviceType="OpenDAP" base="${TWITCHER_PROTECTED_PATH}/thredds/dodsC/" />
<service name="ncml" serviceType="NCML" base="${TWITCHER_PROTECTED_PATH}/thredds/ncml/"/>
<service name="uddc" serviceType="UDDC" base="${TWITCHER_PROTECTED_PATH}/thredds/uddc/"/>
<service name="iso" serviceType="ISO" base="${TWITCHER_PROTECTED_PATH}/thredds/iso/"/>
<service name="wcs" serviceType="WCS" base="${TWITCHER_PROTECTED_PATH}/thredds/wcs/" />
<service name="wms" serviceType="WMS" base="${TWITCHER_PROTECTED_PATH}/thredds/wms/" />
<service name="subsetServer" serviceType="NetcdfSubset" base="${TWITCHER_PROTECTED_PATH}/thredds/ncss/" />
</service>

<datasetScan name="${THREDDS_SERVICE_DATA_LOCATION_NAME}" ID="${THREDDS_SERVICE_DATA_URL_PATH}" path="${THREDDS_SERVICE_DATA_URL_PATH}" location="${THREDDS_SERVICE_DATA_LOCATION_ON_CONTAINER}">
Expand All @@ -27,24 +27,14 @@
<include wildcard="*.md" />
<include wildcard="*.rst" />
<include wildcard="*.csv" />
${THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS}
</filter>

</datasetScan>

<datasetScan name="${THREDDS_DATASET_LOCATION_NAME}" ID="${THREDDS_DATASET_URL_PATH}" path="${THREDDS_DATASET_URL_PATH}" location="${THREDDS_DATASET_LOCATION_ON_CONTAINER}">

<metadata inherited="true">
<serviceName>all</serviceName>
</metadata>

<filter>
<include wildcard="*.nc" />
<include wildcard="*.ncml" />
<include wildcard="*.txt" />
<include wildcard="*.md" />
<include wildcard="*.rst" />
<include wildcard="*.csv" />
</filter>
${THREDDS_DATASET_DATASETSCAN_BODY}

</datasetScan>

Expand Down
37 changes: 20 additions & 17 deletions birdhouse/components/thredds/config/magpie/providers.cfg.template
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,24 @@ providers:
- ".+\\.ncml" # match longest extension first to avoid tuncating it by match of sorter '.nc'
- ".+\\.nc"
metadata_type:
prefixes:
- null # note: special YAML value evaluated as `no-prefix`, use quotes if literal value is needed
- "\\w+\\.gif" # threddsIcon, folder icon, etc.
- "\\w+\\.ico" # favicon
- "\\w+\\.txt" # licence
- "\\w+\\.css" # tds.css
- "catalog\\.\\w+" # note: special case for `THREDDS` top-level directory (root) accessed for `BROWSE`
- catalog
- ncml
- uddc
- iso
prefixes: [
null, # note: special YAML value evaluated as `no-prefix`, use quotes if literal value is needed
"\\w+\\.gif", # threddsIcon, folder icon, etc.
"\\w+\\.ico", # favicon
"\\w+\\.css", # tds.css
"catalog\\.\\w+", # note: special case for `THREDDS` top-level directory (root) accessed for `BROWSE`
catalog,
ncml,
uddc,
iso,
${THREDDS_MAGPIE_EXTRA_METADATA_PREFIXES}
]
data_type:
prefixes:
- fileServer
- dodsC
- wcs
- wms
- ncss
prefixes: [
fileServer,
dodsC,
wcs,
wms,
ncss,
${THREDDS_MAGPIE_EXTRA_DATA_PREFIXES}
]
22 changes: 22 additions & 0 deletions birdhouse/components/thredds/default.env
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,25 @@ export THREDDS_SERVICE_DATA_LOCATION_NAME='Birdhouse'
export THREDDS_DATASET_URL_PATH='datasets'
export THREDDS_SERVICE_DATA_URL_PATH='birdhouse'

export THREDDS_MAGPIE_EXTRA_METADATA_PREFIXES='".+\\.txt", ".+\\.md", ".+\\.rst"'
export THREDDS_MAGPIE_EXTRA_DATA_PREFIXES=''

export THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS=''

export THREDDS_DATASET_DATASETSCAN_BODY='
<metadata inherited="true">
<serviceName>all</serviceName>
</metadata>

<filter>
<include wildcard="*.nc" />
<include wildcard="*.ncml" />
<include wildcard="*.txt" />
<include wildcard="*.md" />
<include wildcard="*.rst" />
<include wildcard="*.csv" />
</filter>
'

# add any new variables not already in 'VARS' or 'OPTIONAL_VARS' that must be replaced in templates here
VARS="
Expand All @@ -28,6 +46,7 @@ VARS="
\$THREDDS_DATASET_LOCATION_NAME
\$THREDDS_DATASET_URL_PATH
\$THREDDS_DATASET_LOCATION_ON_CONTAINER
\$THREDDS_DATASET_DATASETSCAN_BODY
"

OPTIONAL_VARS="
Expand All @@ -39,6 +58,9 @@ OPTIONAL_VARS="
\$THREDDS_IMAGE
\$THREDDS_IMAGE_URI
\$THREDDS_ADDITIONAL_CATALOG
\$THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS
\$THREDDS_MAGPIE_EXTRA_METADATA_PREFIXES
\$THREDDS_MAGPIE_EXTRA_DATA_PREFIXES
"

export DELAYED_EVAL="
Expand Down
78 changes: 68 additions & 10 deletions birdhouse/env.local.example
Original file line number Diff line number Diff line change
Expand Up @@ -456,26 +456,84 @@ export GEOSERVER_ADMIN_PASSWORD="${__DEFAULT__GEOSERVER_ADMIN_PASSWORD}"

# Additional catalogs for THREDDS. Add as many datasetScan XML blocks as needed to THREDDS_ADDITIONAL_CATALOG.
# Each block defines a new top-level catalog. See birdhouse/components/thredds/catalog.xml.template for more information.
export THREDDS_ADDITIONAL_CATALOG=""
#export THREDDS_ADDITIONAL_CATALOG="
# <datasetScan name='dataset_location_name' ID='dataset_url_path' path='dataset_url_path' location='dataset_location_on_container'>
export THREDDS_ADDITIONAL_CATALOG=''
#export THREDDS_ADDITIONAL_CATALOG='
# <datasetScan name="dataset_location_name" ID="dataset_url_path" path="dataset_url_path" location="dataset_location_on_container">
#
# <metadata inherited='true'>
# <metadata inherited="true">
# <serviceName>all</serviceName>
# </metadata>
#
# <filter>
# <include wildcard='*.nc' />
# <include wildcard='*.ncml' />
# <include wildcard='*.txt' />
# <include wildcard='*.md' />
# <include wildcard='*.rst' />
# <include wildcard='*.csv' />
# <include wildcard="*.nc" />
# <include wildcard="*.ncml" />
# <include wildcard="*.txt" />
# <include wildcard="*.md" />
# <include wildcard="*.rst" />
# <include wildcard="*.csv" />
# </filter>
#
# </datasetScan>
#'
# It is possible to define additional compound services in the THREDDS_ADDITIONAL_CATALOG variable as well.
# This may be useful if you are creating a catalog that only provides a subset of the services defined in the
# compound service named "all" (see birdhouse/components/thredds/catalog.xml.template).
# DO NOT define any non-compound services in THREDDS_ADDITIONAL_CATALOG that is not an exact copy of one of the
# variables defined in "all"! Especially, do not change the "base" attribute of any existing service.
# Doing so may break the way that access permissions are enforced when accessing data through this service.

# Additional file filters to add for the Service Data THREDDS dataset. By default, the Service Data dataset will only
# serve files with the following extensions: .nc .ncml .txt .md .rst .csv
# If you need this dataset to serve other files you should update the THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS to add
# additional file filters.
# This may be useful to set if a WPS outputs files to the wps_outputs/ directory (hosted under the Service Data dataset)
# in a file format other than one of the defaults.
# See the example below which would also enable serving .png and .h5 files.
#export THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS="
# <include wildcard="*.png" />
# <include wildcard="*.h5" />
#"

# Set this variable to customize the body of the <datasetScan> XML element for the main THREDDS dataset. This is typically
# the dataset where you would store most of the data served by THREDDS (additional datasets can be configured by setting the
# THREDDS_ADDITIONAL_CATALOG variable).
# By default, the main dataset will only serve files with the following extensions: .nc .ncml .txt .md .rst .csv and will use
# the THREDDS service named "all" (see components/thredds/catalog.xml.template). However this can be customized if desired.
# See the example below which would change the configuration to serve .h5, .md, and .json files.
# See the THREDDS documentation for the <datasetScan> element for all configuration options.
#export THREDDS_DATASET_DATASETSCAN_BODY="
# <metadata inherited='true'>
# <serviceName>all</serviceName>
# </metadata>
#
# <filter>
# <include wildcard='*.h5' />
# <include wildcard='*.json' />
# <exclude wildcard='*.md' />
# </filter>
#"

# Files served by THREDDS are considered to either contain data or metadata (or both). The THREDDS Magpie service allows
# us to handle access permissions different for metadata vs. data. Magpie let's users with "browse" permissions access
# metadata but only users with "read" permissions can access data.
# By accessing files through different THREDDS services (see THREDDS documentation), we can either read the metadata with
# "browse" permissions or the data itself with "read" permissions. For example, by default a NetCDF file can be accessed
# using the NCML service to get its metadata or through the NCSS service to access the data itself.
#
# If you have a file that you would like to be treated as metadata (Magpie will allow users with "browse" permissions to
# access it) no matter which THREDDS service is used to access it, add the file pattern to the `THREDDS_MAGPIE_EXTRA_METADATA_PREFIXES`
# variable. Similarly, if you have a file that you would like to be treated as data no matter which THREDDS service is used
# to access it, add the file pattern to the `THREDDS_MAGPIE_EXTRA_DATA_PREFIXES` variable.
#
# For example, if you want all files with a .h5 extension to be treated as data files in all cases, add '".+\\.h5"' to the
# `THREDDS_MAGPIE_EXTRA_DATA_PREFIXES` variable. Note that values are regular expressions (python) where slashes are double
# escaped. Expressions should be surrounded by double quotes and if multiple expressions are included they should be comma
# delimited.
#
# Current defaults are:
#export THREDDS_MAGPIE_EXTRA_METADATA_PREFIXES='".+\\.txt", ".+\\.md", ".+\\.rst"'
#export THREDDS_MAGPIE_EXTRA_DATA_PREFIXES=''

# Allow using Github as external AuthN/AuthZ provider with Magpie
# To setup Github as login, goto <https://github.com/settings/developers> under section [OAuth Apps]
# and create a new Magpie application with configurations:
Expand Down
4 changes: 2 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,9 @@
# built documents.
#
# The short X.Y version.
version = '2.5.3'
version = '2.5.4'
# The full version, including alpha/beta/rc tags.
release = '2.5.3'
release = '2.5.4'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down

0 comments on commit 11a589f

Please sign in to comment.