Skip to content

Latest commit

 

History

History
582 lines (476 loc) · 29.6 KB

changelog.rst

File metadata and controls

582 lines (476 loc) · 29.6 KB

Changelog

Listed here are the changes between each release of SmartSim and SmartRedis.

Jump to :ref:`SmartRedis Changelog <sr_changelog>`

SmartSim

Development branch

To be released at some future point in time

  • Override the sphinx-tabs extension background color

Detailed Notes

  • The sphinx-tabs documentation extension uses a white background for the tabs component. A custom CSS for those components to inherit the overall theme color has been added. (SmartSim-PR453)

0.6.0

Released on 18 December, 2023

Description

  • Conflicting directives in the SmartSim packaging instructions were fixed
  • sacct and sstat errors are now fatal for Slurm-based workflow executions
  • Added documentation section about ML features and TorchScript
  • Added TorchScript functions to Online Analysis tutorial
  • Added multi-DB example to documentation
  • Improved test stability on HPC systems
  • Added support for producing & consuming telemetry outputs
  • Split tests into groups for parallel execution in CI/CD pipeline
  • Change signature of Experiment.summary()
  • Expose first_device parameter for scripts, functions, models
  • Added support for MINBATCHTIMEOUT in model execution
  • Remove support for RedisAI 1.2.5, use RedisAI 1.2.7 commit
  • Add support for multiple databases

Detailed Notes

  • Several conflicting directives between the setup.py and the setup.cfg were fixed to mitigate warnings issued when building the pip wheel. (SmartSim-PR435)
  • When the Slurm functions sacct and sstat returned an error, it would be ignored and SmartSim's state could become inconsistent. To prevent this, errors raised by sacct or sstat now result in an exception. (SmartSim-PR392)
  • A section named ML Features was added to documentation. It contains multiple examples of how ML models and functions can be added to and executed on the DB. TorchScript-based post-processing was added to the Online Analysis tutorial (SmartSim-PR411)
  • An example of how to use multiple Orchestrators concurrently was added to the documentation (SmartSim-PR409)
  • The test infrastructure was improved. Tests on HPC system are now stable, and issues such as non-stopped Orchestrators or experiments created in the wrong paths have been fixed (SmartSim-PR381)
  • A telemetry monitor was added to check updates and produce events for SmartDashboard (SmartSim-PR426)
  • Split tests into group_a, group_b, slow_tests for parallel execution in CI/CD pipeline (SmartSim-PR417, SmartSim-PR424)
  • Change format argument to style in Experiment.summary(), this is an API break (SmartSim-PR391)
  • Added support for first_device parameter for scripts, functions, and models. This causes them to be loaded to the first num_devices beginning with first_device (SmartSim-PR394)
  • Added support for MINBATCHTIMEOUT in model execution, which caps the delay waiting for a minimium number of model execution operations to accumulate before executing them as a batch (SmartSim-PR387)
  • RedisAI 1.2.5 is not supported anymore. The only RedisAI version is now 1.2.7. Since the officially released RedisAI 1.2.7 has a bug which breaks the build process on Mac OSX, it was decided to use commit 634916c from RedisAI's GitHub repository, where such bug has been fixed. This applies to all operating systems. (SmartSim-PR383)
  • Add support for creation of multiple databases with unique identifiers. (SmartSim-PR342)

0.5.1

Released on 14 September, 2023

Description

  • Add typehints throughout the SmartSim codebase
  • Provide support for Slurm heterogeneous jobs
  • Provide better support for PalsMpiexecSettings
  • Allow for easier inspection of SmartSim entities
  • Log ignored error messages from sacct
  • Fix colocated db preparation bug when using JsrunSettings
  • Fix bug when user specify CPU and devices greater than 1
  • Fix bug when get_allocation called with reserved keywords
  • Enabled mypy in CI for better type safety
  • Mitigate additional suppressed pylint errors
  • Update linting support and apply to existing errors
  • Various improvements to the smart CLI
  • Various documentation improvements
  • Various test suite improvements

Detailed Notes

  • Add methods to allow users to inspect files attached to models and ensembles. (SmartSim-PR352)
  • Add a smart info target to provide rudimentary information about the SmartSim installation. (SmartSim-PR350)
  • Remove unnecessary generation producing unexpected directories in the test suite. (SmartSim-PR349)
  • Add support for heterogeneous jobs to SrunSettings by allowing users to set the --het-group parameter. (SmartSim-PR346)
  • Provide clearer guidelines on how to contribute to SmartSim. (SmartSim-PR344)
  • Integrate PalsMpiexecSettings into the Experiment factory methods when using the "pals" launcher. (SmartSim-PR343)
  • Create public properties where appropriate to mitigate protected-access errors. (SmartSim-PR341)
  • Fix a failure to execute _prep_colocated_db due to incorrect named attr check. (SmartSim-PR339)
  • Enabled and mitigated mypy disallow_any_generics and warn_return_any. (SmartSim-PR338)
  • Add a smart validate target to provide a simple smoke test to assess a SmartSim build. (SmartSim-PR336, SmartSim-PR351)
  • Add typehints to smartsim._core.launcher.step.*. (SmartSim-PR334)
  • Log errors reported from slurm WLM when attempts to retrieve status fail. (SmartSim-PR331, SmartSim-PR332)
  • Fix incorrectly formatted positional arguments in log format strings. (SmartSim-PR330)
  • Ensure that launchers pass environment variables to unmanaged job steps. (SmartSim-PR329)
  • Add additional tests surrounding the RAI_PATH configuration environment variable. (SmartSim-PR328)
  • Remove unnecessary execution of unescaped shell commands. (SmartSim-PR327)
  • Add error if user calls get_allocation with reserved keywords in slurm get_allocation. (SmartSim-PR325)
  • Add error when user requests CPU with devices greater than 1 within add_ml_model and add_script. (SmartSim-PR324)
  • Update documentation surrounding ensemble key prefixing. (SmartSim-PR322)
  • Fix formatting of the Frontier site installation. (SmartSim-PR321)
  • Update pylint dependency, update .pylintrc, mitigate non-breaking issues, suppress api breaks. (SmartSim-PR311)
  • Refactor the smart CLI to use subparsers for better documentation and extension. (SmartSim-PR308)

0.5.0

Released on 6 July, 2023

Description

A full list of changes and detailed notes can be found below:

  • Update SmartRedis dependency to v0.4.1
  • Fix tests for db models and scripts
  • Fix add_ml_model() and add_script() documentation, tests, and code
  • Remove requirements.txt and other places where dependencies were defined
  • Replace limit_app_cpus with limit_db_cpus for co-located orchestrators
  • Remove wait time associated with Experiment launch summary
  • Update and rename Redis conf file
  • Migrate from redis-py-cluster to redis-py
  • Update full test suite to not require a TF wheel at test time
  • Update doc strings
  • Remove deprecated code
  • Relax the coloredlogs version
  • Update Fortran tutorials for SmartRedis
  • Add support for multiple network interface binding in Orchestrator and Colocated DBs
  • Add typehints and static analysis

Detailed notes

  • Updates SmartRedis to the most current release (SmartSim-PR316)
  • Fixes and enhancements to documentation (SmartSim-PR317, SmartSim-PR314, SmartSim-PR287)
  • Various fixes and enhancements to the test suite (SmartSim-PR315, SmartSim-PR312, SmartSim-PR310, SmartSim-PR302, SmartSim-PR283)
  • Fix a defect in the tests related to database models and scripts that was causing key collisions when testing on workload managers (SmartSim-PR313)
  • Remove requirements.txt and other places where dependencies were defined. (SmartSim-PR307)
  • Fix defect where dictionaries used to create run settings can be changed unexpectedly due to copy-by-ref (SmartSim-PR305)
  • The underlying code for Model.add_ml_model() and Model.add_script() was fixed to correctly handle multi-GPU configurations. Tests were updated to run on non-local launchers. Documentation was updated and fixed. Also, the default testing interface has been changed to lo instead of ipogif. (SmartSim-PR304)
  • Typehints have been added. A makefile target make check-mypy executes static analysis with mypy. (SmartSim-PR295, SmartSim-PR301, SmartSim-PR303)
  • Replace limit_app_cpus with limit_db_cpus for co-located orchestrators. This resolves some incorrect behavior/assumptions about how the application would be pinned. Instead, users should directly specify the binding options in their application using the options appropriate for their launcher (SmartSim-PR306)
  • Simplify code in random_permutations parameter generation strategy (SmartSim-PR300)
  • Remove wait time associated with Experiment launch summary (SmartSim-PR298)
  • Update Redis conf file to conform with Redis v7.0.5 conf file (SmartSim-PR293)
  • Migrate from redis-py-cluster to redis-py for cluster status checks (SmartSim-PR292)
  • Update full test suite to no longer require a tensorflow wheel to be available at test time. (SmartSim-PR291)
  • Correct spelling of colocated in doc strings (SmartSim-PR290)
  • Deprecated launcher-specific orchestrators, constants, and ML utilities were removed. (SmartSim-PR289)
  • Relax the coloredlogs version to be greater than 10.0 (SmartSim-PR288)
  • Update the Github Actions runner image from macos-10.15` to macos-12`. The former began deprecation in May 2022 and was finally removed in May 2023. (SmartSim-PR285)
  • The Fortran tutorials had not been fully updated to show how to handle return/error codes. These have now all been updated. (SmartSim-PR284)
  • Orchestrator and Colocated DB now accept a list of interfaces to bind to. The argument name is still interface for backward compatibility reasons. (SmartSim-PR281)
  • Typehints have been added to public APIs. A makefile target to execute static analysis with mypy is available make check-mypy. (SmartSim-PR295)

0.4.2

Released on April 12, 2023

Description

This release of SmartSim had a focus on polishing and extending exiting features already provided by SmartSim. Most notably, this release provides support to allow users to colocate their models with an orchestrator using Unix domain sockets and support for launching models as batch jobs.

Additionally, SmartSim has updated its tool chains to provide a better user experience. Notably, SmarSim can now be used with Python 3.10, Redis 7.0.5, and RedisAI 1.2.7. Furthermore, SmartSim now utilizes SmartRedis's aggregation lists to streamline the use and extension of ML data loaders, making working with popular machine learning frameworks in SmartSim a breeze.

A full list of changes and detailed notes can be found below:

  • Add support for colocating an orchestrator over UDS
  • Add support for Python 3.10, deprecate support for Python 3.7 and RedisAI 1.2.3
  • Drop support for Ray
  • Update ML data loaders to make use of SmartRedis's aggregation lists
  • Allow for models to be launched independently as batch jobs
  • Update to current version of Redis to 7.0.5
  • Add support for RedisAI 1.2.7, pyTorch 1.11.0, Tensorflow 2.8.0, ONNXRuntime 1.11.1
  • Fix bug in colocated database entrypoint when loading PyTorch models
  • Fix test suite behavior with environment variables

Detailed Notes

  • Running some tests could result in some SmartSim-specific environment variables to be set. Such environment variables are now reset after each test execution. Also, a warning for environment variable usage in Slurm was added, to make the user aware in case an environment variable will not be assigned the desired value with --export. (SmartSim-PR270)
  • The PyTorch and TensorFlow data loaders were update to make use of aggregation lists. This breaks their API, but makes them easier to use. (SmartSim-PR264)
  • The support for Ray was dropped, as its most recent versions caused problems when deployed through SmartSim. We plan to release a separate add-on library to accomplish the same results. If you are interested in getting the Ray launch functionality back in your workflow, please get in touch with us! (SmartSim-PR263)
  • Update from Redis version 6.0.8 to 7.0.5. (SmartSim-PR258)
  • Adds support for Python 3.10 without the ONNX machine learning backend. Deprecates support for Python 3.7 as it will stop receiving security updates. Deprecates support for RedisAI 1.2.3. Update the build process to be able to correctly fetch supported dependencies. If a user attempts to build an unsupported dependency, an error message is shown highlighting the discrepancy. (SmartSim-PR256)
  • Models were given a batch_settings attribute. When launching a model through Experiment.start the Experiment will first check for a non-nullish value at that attribute. If the check is satisfied, the Experiment will attempt to wrap the underlying run command in a batch job using the object referenced at Model.batch_settings as the batch settings for the job. If the check is not satisfied, the Model is launched in the traditional manner as a job step. (SmartSim-PR245)
  • Fix bug in colocated database entrypoint stemming from uninitialized variables. This bug affects PyTorch models being loaded into the database. (SmartSim-PR237)
  • The release of RedisAI 1.2.7 allows us to update support for recent versions of PyTorch, Tensorflow, and ONNX (SmartSim-PR234)
  • Make installation of correct Torch backend more reliable according to instruction from PyTorch
  • In addition to TCP, add UDS support for colocating an orchestrator with models. Methods Model.colocate_db_tcp and Model.colocate_db_uds were added to expose this functionality. The Model.colocate_db method remains and uses TCP for backward compatibility (SmartSim-PR246)

0.4.1

Released on June 24, 2022

Description: This release of SmartSim introduces a new experimental feature to help make SmartSim workflows more portable: the ability to run simulations models in a container via Singularity. This feature has been tested on a small number of platforms and we encourage users to provide feedback on its use.

We have also made improvements in a variety of areas: new utilities to load scripts and machine learning models into the database directly from SmartSim driver scripts and install-time choice to use either KeyDB or Redis for the Orchestrator. The RunSettings API is now more consistent across subclasses. Another key focus of this release was to aid new SmartSim users by including more extensive tutorials and improving the documentation. The docker image containing the SmartSim tutorials now also includes a tutorial on online training.

Launcher improvements

Documentation and tutorials

General improvements and bug fixes

Dependency updates

0.4.0

Released on Feb 11, 2022

Description: In this release SmartSim continues to promote ease of use. To this end SmartSim has introduced new portability features that allow users to abstract away their targeted hardware, while providing even more compatibility with existing libraries.

A new feature, Co-located orchestrator deployments has been added which provides scalable online inference capabilities that overcome previous performance limitations in seperated orchestrator/application deployments. For more information on advantages of co-located deployments, see the Orchestrator section of the SmartSim documentation.

The SmartSim build was significantly improved to increase customization of build toolchain and the smart command line inferface was expanded.

Additional tweaks and upgrades have also been made to ensure an optimal experience. Here is a comprehensive list of changes made in SmartSim 0.4.0.

Orchestrator Enhancements:

Emphasize Driver Script Portability:

  • Add ability to create run settings through an experiment (SmartSim-PR110)
  • Add ability to create batch settings through an experiment (SmartSim-PR112)
  • Add automatic launcher detection to experiment portability functions (SmartSim-PR120)

Expand Machine Learning Library Support:

Expand Launcher Setting Options:

  • Add ability to use base RunSettings on a Slurm, PBS, or Cobalt launchers (SmartSim-PR90)
  • Add ability to use base RunSettings on LFS launcher (SmartSim-PR108)

Deprecations and Breaking Changes

General Improvements and Bug Fixes:

Documentation Updates:

0.3.2

Released on August 10, 2021

Description:

0.3.1

Released on May 5, 2021

Description: This release was dedicated to making the install process easier. SmartSim can be installed from PyPI now and the smart cli tool makes installing the machine learning runtimes much easier.

0.3.0

Released on April 1, 2021

Description:

  • initial 0.3.0 (first public) release of SmartSim

SmartRedis