Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

THREDDS: add more options to configure catalog.xml #472

Merged
merged 8 commits into from
Oct 31, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 23 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,29 @@
[Unreleased](https://github.com/bird-house/birdhouse-deploy/tree/master) (latest)
------------------------------------------------------------------------------------------------------------------

[//]: # (list changes here, using '-' for each new entry, remove this when items are added)
## Changes

- THREDDS: add more options to configure catalog.xml
mishaschwartz marked this conversation as resolved.
Show resolved Hide resolved
- Currently the default THREDDS configuration creates two default datasets, the Service Data dataset and the
mishaschwartz marked this conversation as resolved.
Show resolved Hide resolved
Main dataset. The Service Data dataset is used internally and hosts WPS outputs. The Main dataset is the
place where users can access data served by THREDDS. Both of these are configured to serve files with the following
extensions: .nc .ncml .txt .md .rst .csv

- In order to allow the THREDDS server to serve files with additional extensions, this introduces two new
variables:
- `THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS`: this allows users to specify additional [filter
mishaschwartz marked this conversation as resolved.
Show resolved Hide resolved
elements](https://docs.unidata.ucar.edu/tds/current/userguide/tds_dataset_scan_ref.html#including-only-desired-files) to the Service Data dataset. This is especially useful if a WPS
outputs files with an extension other than the default (eg: .h5) to the `wps_outputs/` directory.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the use case - not against it, just making sure the intended use is appropriate.

Is there any advantage of exposing those HDF5 files via THREDDS rather than accessing them directly by the WPS-outputs dir? If anything, I would expect Nginx to provide much better/faster responses, as well potentially additional support of Content-Range requests if enabled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I have no intuition about what is better/faster in this case

- `THREDDS_DATASET_DATASETSCAN_BODY`: this allows users to specify the whole body of the main dataset's
[`<datasetScan>`](https://docs.unidata.ucar.edu/tds/current/userguide/tds_dataset_scan_ref.html) element.
This allows users to fully customize how this dataset serves files.

- We limit the configuration options for the Service Data dataset more than the main dataset because the Service
Data dataset requires a basic configuration in order to properly serve WPS outputs. Making significant changes
to this configuration could have unexpected negative impacts on WPS usage.

- The defaults for these new variables are fully backwards compatible. Without changing these variables, the THREDDS
server should behave exactly the same as before.

[2.5.3](https://github.com/bird-house/birdhouse-deploy/tree/2.5.3) (2024-09-11)
------------------------------------------------------------------------------------------------------------------
Expand Down
14 changes: 2 additions & 12 deletions birdhouse/components/thredds/catalog.xml.template
Original file line number Diff line number Diff line change
Expand Up @@ -27,24 +27,14 @@
<include wildcard="*.md" />
<include wildcard="*.rst" />
<include wildcard="*.csv" />
${THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS}
</filter>

</datasetScan>

<datasetScan name="${THREDDS_DATASET_LOCATION_NAME}" ID="${THREDDS_DATASET_URL_PATH}" path="${THREDDS_DATASET_URL_PATH}" location="${THREDDS_DATASET_LOCATION_ON_CONTAINER}">

<metadata inherited="true">
<serviceName>all</serviceName>
</metadata>

<filter>
<include wildcard="*.nc" />
<include wildcard="*.ncml" />
<include wildcard="*.txt" />
<include wildcard="*.md" />
<include wildcard="*.rst" />
<include wildcard="*.csv" />
</filter>
${THREDDS_DATASET_DATASETSCAN_BODY}

</datasetScan>

Expand Down
17 changes: 17 additions & 0 deletions birdhouse/components/thredds/default.env
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,22 @@ export THREDDS_SERVICE_DATA_LOCATION_NAME='Birdhouse'
export THREDDS_DATASET_URL_PATH='datasets'
export THREDDS_SERVICE_DATA_URL_PATH='birdhouse'

export THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS=''

export THREDDS_DATASET_DATASETSCAN_BODY='
<metadata inherited="true">
<serviceName>all</serviceName>
</metadata>

<filter>
<include wildcard="*.nc" />
<include wildcard="*.ncml" />
<include wildcard="*.txt" />
<include wildcard="*.md" />
<include wildcard="*.rst" />
<include wildcard="*.csv" />
</filter>
mishaschwartz marked this conversation as resolved.
Show resolved Hide resolved
'

# add any new variables not already in 'VARS' or 'OPTIONAL_VARS' that must be replaced in templates here
VARS="
Expand All @@ -28,6 +43,7 @@ VARS="
\$THREDDS_DATASET_LOCATION_NAME
\$THREDDS_DATASET_URL_PATH
\$THREDDS_DATASET_LOCATION_ON_CONTAINER
\$THREDDS_DATASET_DATASETSCAN_BODY
"

OPTIONAL_VARS="
Expand All @@ -39,6 +55,7 @@ OPTIONAL_VARS="
\$THREDDS_IMAGE
\$THREDDS_IMAGE_URI
\$THREDDS_ADDITIONAL_CATALOG
\$THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS
"

export DELAYED_EVAL="
Expand Down
34 changes: 34 additions & 0 deletions birdhouse/env.local.example
Original file line number Diff line number Diff line change
Expand Up @@ -476,6 +476,40 @@ export THREDDS_ADDITIONAL_CATALOG=""
# </datasetScan>
#"

# Additional file filters to add for the Service Data THREDDS dataset. By default, the Service Data dataset will only
# serve files with the following extensions: .nc .ncml .txt .md .rst .csv
# If you need this dataset to serve other files you should update the THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS to add
# additional file filters.
Comment on lines +487 to +488
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention the corresponding THREDDS_MAGPIE_... variables as well?

# This may be useful to set if a WPS outputs files to the wps_outputs/ directory (hosted under the Service Data dataset)
# in a file format other than one of the defaults.
# See the example below which would also enable serving .png and .h5 files.
#export THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS="
mishaschwartz marked this conversation as resolved.
Show resolved Hide resolved
# <include wildcard="*.png" />
# <include wildcard="*.h5" />
#"

# Set this variable to customize the body of the <datasetScan> XML element for the main THREDDS dataset. This is typically
# the dataset where you would store most of the data served by THREDDS (additional datasets can be configured by setting the
# THREDDS_ADDITIONAL_CATALOG variable).
# By default, the main dataset will only serve files with the following extensions: .nc .ncml .txt .md .rst .csv and will use
# the THREDDS service named "all" (see components/thredds/catalog.xml.template). However this can be customized if desired.
# See the example below which would change the configuration to also serve .h5 and .json files instead of .md and .rst files.
# See the THREDDS documentation for the <datasetScan> element for all configuration options.
#export THREDDS_DATASET_DATASETSCAN_BODY='
# <metadata inherited="true">
# <serviceName>all</serviceName>
# </metadata>
#
# <filter>
# <include wildcard="*.nc" />
# <include wildcard="*.ncml" />
# <include wildcard="*.txt" />
# <include wildcard="*.h5" />
# <include wildcard="*.json" />
# <include wildcard="*.csv" />
# </filter>
#'

# Allow using Github as external AuthN/AuthZ provider with Magpie
# To setup Github as login, goto <https://github.com/settings/developers> under section [OAuth Apps]
# and create a new Magpie application with configurations:
Expand Down
Loading