Skip to content

Toolbox MVP

Emily Carpenter edited this page Apr 5, 2023 · 11 revisions

Overview

The Unified Workflow is creating a toolbox of standalone tools that can be applied to common technical challenges faced in the implementation of any numerical weather prediction workflow. These tools will be Python-based and will follow modern best practices to determine design and interoperability. Taking this approach, the team will be able to deliver tools on a semi-regular cadence in an effort to add value for UFS Apps as early as possible in the development cycle.

Users

  • UFS Apps Teams: global_workflow, SRW, HAFS, UFS Weather Model
  • UW Team

The UW Toolbox

The tools that the UW Team plans to prioritize over the next several PIs aim to simplify and unify the approach to the general problems (listed below) faced by users of NWP systems. Here are a few of high-level requirements of such a tool box.

Requirements

General requirements for all tool are as follows. Specific requirements for each tool are specified in the next section.

  • Each tool has its own user interface.
  • Each tool follows Python standards and design best practices for object-oriented coding. This includes, but is not limited to
    • PEP-8 standards as checked by pylint
    • DRY Principle
    • Separation of Concerns
    • Composition Over Inheritance
    • YAGNI
    • SOLID
  • Each tool can be called as a utility from existing bash and Python scripts in workflows.
  • Only approved 3rd-party libraries will be adopted, and must be installable via pip (for NCO use)
  • Documentation should be included with each tool.
  • Each tool is thoroughly tested by unit and functional tests.
  • Tools must interact with a variety of methods of configuration, including global environment variables from the user's shell.
  • Each tool fails, as necessary, with informative user feedback, and exits with 0 exit status when successful.
  • Each tool performs appropriate logging at various levels – debug, log, quiet, etc.

The Tools

  • Model configuration management
    • These tools will give users the ability to treat configuration files as templates (using Jinja2-type templating) or in the case of dictable (stored in key/value pairs) configuration files, take full control over the key/value contents by providing any key/value pair to update, add, or remove settings to meet their needs.
    • The existing UFS Weather Model configuration files are summarized here: UFS Weather Model Configuration File Types.
    • Benefits
      • Apps can source configuration files directly from the UFS Weather model regression tests, reducing the overhead with maintaining copies in App code
      • All configuration settings can be stored in a single language (YAML is the standard), reducing need for users to learn which syntax to use for which type of configuration data.
  • Schema tool for configuration validation
    • This tool will provide the basic infrastructure for using the 3rd-party Schema package to validate that a given configuration file meets the needs of a given Schema-enabled tool. For a component to be Schema-enabled, it will need to provide it's requirements for each of its configuration options in a Python data file that defines acceptable entries, data types, bounds checking, etc.
    • Benefits
      • Users can perform cheap sanity checks to see whether their configuration file should work with the tool that is being configured
      • When applied to the model configuration files, no model compilation would be required, and no HPC resources would be required.
      • Unifies the approach for validating the configuration of any tool in the UW Toolbox.
  • Batch job card creation
    • This tool enables a user to generate a job card as configured by the Application system.
    • Benefits
      • Users of Rocoto-based systems can readily provide "sandbox" cases to colleagues for easier debugging, along with a job submission script identical to settings used by Rocoto.
      • Provides a unified approach to build these files for Apps that do not use Rocoto.
  • Data ingest
    • Along with a file mover that knows how to move files between various common data archives (disk, HPSS, NODD, cloud storage, FTP, etc.), this tool will interface to a database (not necessarily relational) of the known file locations for data needed by UFS Applications.
    • Benefits
      • Unifies the language around how external data is retrieved
      • Unifying the knowledge base of common datasets reduces the maintenance.
      • Testing data retrieval can be decoupled from workflow tests, reducing HPC resources needed to test end-to-end applications.
      • Decouples dataset definition from tooling for easier user interactions.
  • Workflow definition creation
    • Several workflow engines are used among the UFS Apps (ecFlow, Rocoto, and cylc), each communicating data related to user account settings, platform settings, resource requirements, and task dependencies in different languages. This tool will interface a common definition of such information (YAML is standard) into the necessary files for running a workflow with the desired engine.
    • Benefits
      • Reduced overhead in learning nuances of different workflow engines.
      • Ease of switching between workflow engines for the same App (often required in R2O transition)
  • Run scripts
    • This set of tools unifies the way in which each app runs a given UFS component (weather model, GSI, pre-processing, etc.).
    • Should be designed around the concepts of strict, contractual interfaces and flexible configuration.
    • Benefits
      • Can be replaced one-by-one without impact to existing workflows (requires an appropriate user interface, and following the NCO-required J-Jobs/ex-script relationships)
      • Unification of run scripts (in addition to unification of configuration) provides the same look and feel throughout all UFS Apps.
      • Provides a foundation for interchangeable parts when coupled with the standardization of component configuration.

Basic Tool Design

Many of the standalone tools will follow a very basic and similar design pattern as in the diagram below. In most cases, a user might provide some template or base file, this could be a template or namelist for the configuration management, a YAML file of known data locations for data retrieval, or a YAML file for defining a default cold start workflow. For batch job card creation, this file wouldn't even be necessary.

The user will need to provide a Configuration Object (a Python dictionary) for all tools, and this will be standardized through the use of an interface that builds the dictionary from a variety of sources – a dictable file(s), user environment variables, and command-line arguments.

Once the user provides the desired Configuration Object and optional "base" file, the tool would parse the file, gather any configurables from it or from it's knowledge base of required configurables, and validate that the Configuration Object provided the necessary ones. Then it would render the necessary output (in memory) and provide an option to write the result to a file.

Template Tool - Generic Tool Design

User Interface

The tools that follow this design will attempt to standardize the command line arguments. They will almost always need these common arguments:

  • -i, -input to describe the input template or base file
  • -o, -output to describe the path to the output file
  • -c, -config to indicate path(s) to an input config file
  • -s, -schema to describe any additional validation or defaults file
  • --dry-run to write the result to standard out. do not write to output
  • config.[key]=value positional arguments to specify command-line arguments that should be wrapped into the Configuration Object
  • -h, help
  • -values_needed to write to standard output which keys have completed values, which keys have unfilled jinja2 templates, and which keys are set to empty
  • --input_file_type to convert the given input file to provided type. Accepts YAML, INI, or F90.
  • --config_file_type to convert the given config file to provided type. Accepts YAML, INI, or F90.
  • --output_file_type to convert the given output file to provided type. Accepts YAML, INI, or F90.

Goals and Milestones

PI 6 (Mid Sep - Mid Nov 2022)

  • Complete development on Stage II model configuration management tool
  • Add UI to batch job card creator and file mover
  • Complete development for file mover classes to include interfaces for URL, S3 buckets, and HPSS
  • Add data location definitions for use with file mover
  • Prepare model configuration management tool for release

PI 7 (Mid Jan - Mid Mar 2023)

  • Begin work on validation framework
  • Begin work on workflow definition creation
  • Release tools for model configuration management, file mover, and batch job creator
  • Incorporate tools into UFS Weather Model and Short Range Weather App

Discussion and Feedback

Discussion and feedback pages for the wiki can be found here.