Listed here are the changes between each release of SmartSim and SmartRedis.
Jump to :ref:`SmartRedis Changelog <sr_changelog>`
To be released at some future point in time
- Override the sphinx-tabs extension background color
Detailed Notes
- The sphinx-tabs documentation extension uses a white background for the tabs component. A custom CSS for those components to inherit the overall theme color has been added. (SmartSim-PR453)
Released on 18 December, 2023
Description
- Conflicting directives in the SmartSim packaging instructions were fixed
- sacct and sstat errors are now fatal for Slurm-based workflow executions
- Added documentation section about ML features and TorchScript
- Added TorchScript functions to Online Analysis tutorial
- Added multi-DB example to documentation
- Improved test stability on HPC systems
- Added support for producing & consuming telemetry outputs
- Split tests into groups for parallel execution in CI/CD pipeline
- Change signature of Experiment.summary()
- Expose first_device parameter for scripts, functions, models
- Added support for MINBATCHTIMEOUT in model execution
- Remove support for RedisAI 1.2.5, use RedisAI 1.2.7 commit
- Add support for multiple databases
Detailed Notes
- Several conflicting directives between the setup.py and the setup.cfg were fixed to mitigate warnings issued when building the pip wheel. (SmartSim-PR435)
- When the Slurm functions sacct and sstat returned an error, it would be ignored and SmartSim's state could become inconsistent. To prevent this, errors raised by sacct or sstat now result in an exception. (SmartSim-PR392)
- A section named ML Features was added to documentation. It contains multiple examples of how ML models and functions can be added to and executed on the DB. TorchScript-based post-processing was added to the Online Analysis tutorial (SmartSim-PR411)
- An example of how to use multiple Orchestrators concurrently was added to the documentation (SmartSim-PR409)
- The test infrastructure was improved. Tests on HPC system are now stable, and issues such as non-stopped Orchestrators or experiments created in the wrong paths have been fixed (SmartSim-PR381)
- A telemetry monitor was added to check updates and produce events for SmartDashboard (SmartSim-PR426)
- Split tests into group_a, group_b, slow_tests for parallel execution in CI/CD pipeline (SmartSim-PR417, SmartSim-PR424)
- Change format argument to style in Experiment.summary(), this is an API break (SmartSim-PR391)
- Added support for first_device parameter for scripts, functions, and models. This causes them to be loaded to the first num_devices beginning with first_device (SmartSim-PR394)
- Added support for MINBATCHTIMEOUT in model execution, which caps the delay waiting for a minimium number of model execution operations to accumulate before executing them as a batch (SmartSim-PR387)
- RedisAI 1.2.5 is not supported anymore. The only RedisAI version is now 1.2.7. Since the officially released RedisAI 1.2.7 has a bug which breaks the build process on Mac OSX, it was decided to use commit 634916c from RedisAI's GitHub repository, where such bug has been fixed. This applies to all operating systems. (SmartSim-PR383)
- Add support for creation of multiple databases with unique identifiers. (SmartSim-PR342)
Released on 14 September, 2023
Description
- Add typehints throughout the SmartSim codebase
- Provide support for Slurm heterogeneous jobs
- Provide better support for PalsMpiexecSettings
- Allow for easier inspection of SmartSim entities
- Log ignored error messages from sacct
- Fix colocated db preparation bug when using JsrunSettings
- Fix bug when user specify CPU and devices greater than 1
- Fix bug when get_allocation called with reserved keywords
- Enabled mypy in CI for better type safety
- Mitigate additional suppressed pylint errors
- Update linting support and apply to existing errors
- Various improvements to the smart CLI
- Various documentation improvements
- Various test suite improvements
Detailed Notes
- Add methods to allow users to inspect files attached to models and ensembles. (SmartSim-PR352)
- Add a smart info target to provide rudimentary information about the SmartSim installation. (SmartSim-PR350)
- Remove unnecessary generation producing unexpected directories in the test suite. (SmartSim-PR349)
- Add support for heterogeneous jobs to SrunSettings by allowing users to set the --het-group parameter. (SmartSim-PR346)
- Provide clearer guidelines on how to contribute to SmartSim. (SmartSim-PR344)
- Integrate PalsMpiexecSettings into the Experiment factory methods when using the "pals" launcher. (SmartSim-PR343)
- Create public properties where appropriate to mitigate protected-access errors. (SmartSim-PR341)
- Fix a failure to execute _prep_colocated_db due to incorrect named attr check. (SmartSim-PR339)
- Enabled and mitigated mypy disallow_any_generics and warn_return_any. (SmartSim-PR338)
- Add a smart validate target to provide a simple smoke test to assess a SmartSim build. (SmartSim-PR336, SmartSim-PR351)
- Add typehints to smartsim._core.launcher.step.*. (SmartSim-PR334)
- Log errors reported from slurm WLM when attempts to retrieve status fail. (SmartSim-PR331, SmartSim-PR332)
- Fix incorrectly formatted positional arguments in log format strings. (SmartSim-PR330)
- Ensure that launchers pass environment variables to unmanaged job steps. (SmartSim-PR329)
- Add additional tests surrounding the RAI_PATH configuration environment variable. (SmartSim-PR328)
- Remove unnecessary execution of unescaped shell commands. (SmartSim-PR327)
- Add error if user calls get_allocation with reserved keywords in slurm get_allocation. (SmartSim-PR325)
- Add error when user requests CPU with devices greater than 1 within add_ml_model and add_script. (SmartSim-PR324)
- Update documentation surrounding ensemble key prefixing. (SmartSim-PR322)
- Fix formatting of the Frontier site installation. (SmartSim-PR321)
- Update pylint dependency, update .pylintrc, mitigate non-breaking issues, suppress api breaks. (SmartSim-PR311)
- Refactor the smart CLI to use subparsers for better documentation and extension. (SmartSim-PR308)
Released on 6 July, 2023
Description
A full list of changes and detailed notes can be found below:
- Update SmartRedis dependency to v0.4.1
- Fix tests for db models and scripts
- Fix add_ml_model() and add_script() documentation, tests, and code
- Remove requirements.txt and other places where dependencies were defined
- Replace limit_app_cpus with limit_db_cpus for co-located orchestrators
- Remove wait time associated with Experiment launch summary
- Update and rename Redis conf file
- Migrate from redis-py-cluster to redis-py
- Update full test suite to not require a TF wheel at test time
- Update doc strings
- Remove deprecated code
- Relax the coloredlogs version
- Update Fortran tutorials for SmartRedis
- Add support for multiple network interface binding in Orchestrator and Colocated DBs
- Add typehints and static analysis
Detailed notes
- Updates SmartRedis to the most current release (SmartSim-PR316)
- Fixes and enhancements to documentation (SmartSim-PR317, SmartSim-PR314, SmartSim-PR287)
- Various fixes and enhancements to the test suite (SmartSim-PR315, SmartSim-PR312, SmartSim-PR310, SmartSim-PR302, SmartSim-PR283)
- Fix a defect in the tests related to database models and scripts that was causing key collisions when testing on workload managers (SmartSim-PR313)
- Remove requirements.txt and other places where dependencies were defined. (SmartSim-PR307)
- Fix defect where dictionaries used to create run settings can be changed unexpectedly due to copy-by-ref (SmartSim-PR305)
- The underlying code for Model.add_ml_model() and Model.add_script() was fixed to correctly handle multi-GPU configurations. Tests were updated to run on non-local launchers. Documentation was updated and fixed. Also, the default testing interface has been changed to lo instead of ipogif. (SmartSim-PR304)
- Typehints have been added. A makefile target make check-mypy executes static analysis with mypy. (SmartSim-PR295, SmartSim-PR301, SmartSim-PR303)
- Replace limit_app_cpus with limit_db_cpus for co-located orchestrators. This resolves some incorrect behavior/assumptions about how the application would be pinned. Instead, users should directly specify the binding options in their application using the options appropriate for their launcher (SmartSim-PR306)
- Simplify code in random_permutations parameter generation strategy (SmartSim-PR300)
- Remove wait time associated with Experiment launch summary (SmartSim-PR298)
- Update Redis conf file to conform with Redis v7.0.5 conf file (SmartSim-PR293)
- Migrate from redis-py-cluster to redis-py for cluster status checks (SmartSim-PR292)
- Update full test suite to no longer require a tensorflow wheel to be available at test time. (SmartSim-PR291)
- Correct spelling of colocated in doc strings (SmartSim-PR290)
- Deprecated launcher-specific orchestrators, constants, and ML utilities were removed. (SmartSim-PR289)
- Relax the coloredlogs version to be greater than 10.0 (SmartSim-PR288)
- Update the Github Actions runner image from macos-10.15` to macos-12`. The former began deprecation in May 2022 and was finally removed in May 2023. (SmartSim-PR285)
- The Fortran tutorials had not been fully updated to show how to handle return/error codes. These have now all been updated. (SmartSim-PR284)
- Orchestrator and Colocated DB now accept a list of interfaces to bind to. The argument name is still interface for backward compatibility reasons. (SmartSim-PR281)
- Typehints have been added to public APIs. A makefile target to execute static analysis with mypy is available make check-mypy. (SmartSim-PR295)
Released on April 12, 2023
Description
This release of SmartSim had a focus on polishing and extending exiting features already provided by SmartSim. Most notably, this release provides support to allow users to colocate their models with an orchestrator using Unix domain sockets and support for launching models as batch jobs.
Additionally, SmartSim has updated its tool chains to provide a better user experience. Notably, SmarSim can now be used with Python 3.10, Redis 7.0.5, and RedisAI 1.2.7. Furthermore, SmartSim now utilizes SmartRedis's aggregation lists to streamline the use and extension of ML data loaders, making working with popular machine learning frameworks in SmartSim a breeze.
A full list of changes and detailed notes can be found below:
- Add support for colocating an orchestrator over UDS
- Add support for Python 3.10, deprecate support for Python 3.7 and RedisAI 1.2.3
- Drop support for Ray
- Update ML data loaders to make use of SmartRedis's aggregation lists
- Allow for models to be launched independently as batch jobs
- Update to current version of Redis to 7.0.5
- Add support for RedisAI 1.2.7, pyTorch 1.11.0, Tensorflow 2.8.0, ONNXRuntime 1.11.1
- Fix bug in colocated database entrypoint when loading PyTorch models
- Fix test suite behavior with environment variables
Detailed Notes
- Running some tests could result in some SmartSim-specific environment variables to be set. Such environment variables are now reset after each test execution. Also, a warning for environment variable usage in Slurm was added, to make the user aware in case an environment variable will not be assigned the desired value with --export. (SmartSim-PR270)
- The PyTorch and TensorFlow data loaders were update to make use of aggregation lists. This breaks their API, but makes them easier to use. (SmartSim-PR264)
- The support for Ray was dropped, as its most recent versions caused problems when deployed through SmartSim. We plan to release a separate add-on library to accomplish the same results. If you are interested in getting the Ray launch functionality back in your workflow, please get in touch with us! (SmartSim-PR263)
- Update from Redis version 6.0.8 to 7.0.5. (SmartSim-PR258)
- Adds support for Python 3.10 without the ONNX machine learning backend. Deprecates support for Python 3.7 as it will stop receiving security updates. Deprecates support for RedisAI 1.2.3. Update the build process to be able to correctly fetch supported dependencies. If a user attempts to build an unsupported dependency, an error message is shown highlighting the discrepancy. (SmartSim-PR256)
- Models were given a batch_settings attribute. When launching a model through Experiment.start the Experiment will first check for a non-nullish value at that attribute. If the check is satisfied, the Experiment will attempt to wrap the underlying run command in a batch job using the object referenced at Model.batch_settings as the batch settings for the job. If the check is not satisfied, the Model is launched in the traditional manner as a job step. (SmartSim-PR245)
- Fix bug in colocated database entrypoint stemming from uninitialized variables. This bug affects PyTorch models being loaded into the database. (SmartSim-PR237)
- The release of RedisAI 1.2.7 allows us to update support for recent versions of PyTorch, Tensorflow, and ONNX (SmartSim-PR234)
- Make installation of correct Torch backend more reliable according to instruction from PyTorch
- In addition to TCP, add UDS support for colocating an orchestrator with models. Methods Model.colocate_db_tcp and Model.colocate_db_uds were added to expose this functionality. The Model.colocate_db method remains and uses TCP for backward compatibility (SmartSim-PR246)
Released on June 24, 2022
Description: This release of SmartSim introduces a new experimental feature to help make SmartSim workflows more portable: the ability to run simulations models in a container via Singularity. This feature has been tested on a small number of platforms and we encourage users to provide feedback on its use.
We have also made improvements in a variety of areas: new utilities to load scripts and machine learning models into the database directly from SmartSim driver scripts and install-time choice to use either KeyDB or Redis for the Orchestrator. The RunSettings API is now more consistent across subclasses. Another key focus of this release was to aid new SmartSim users by including more extensive tutorials and improving the documentation. The docker image containing the SmartSim tutorials now also includes a tutorial on online training.
Launcher improvements
- New methods for specifying RunSettings parameters (SmartSim-PR166) (SmartSim-PR170)
- Better support for mpirun, mpiexec, and orterun as launchers (SmartSim-PR186)
- Experimental: add support for running models via Singularity (SmartSim-PR204)
Documentation and tutorials
- Tutorial updates (SmartSim-PR155) (SmartSim-PR203) (SmartSim-PR208)
- Add SmartSim Zoo info to documentation (SmartSim-PR175)
- New tutorial for demonstrating online training (SmartSim-PR176) (SmartSim-PR188)
General improvements and bug fixes
- Set models and scripts at the driver level (SmartSim-PR185)
- Optionally use KeyDB for the orchestrator (SmartSim-PR180)
- Ability to specify system-level libraries (SmartSim-PR154) (SmartSim-PR182)
- Fix the handling of LSF gpus_per_shard (SmartSim-PR164)
- Fix error when re-running smart build (SmartSim-PR165)
- Fix generator hanging when tagged configuration variables are missing (SmartSim-PR177)
Dependency updates
- CMake version from 3.10 to 3.13 (SmartSim-PR152)
- Update click to 8.0.2 (SmartSim-PR200)
Released on Feb 11, 2022
Description: In this release SmartSim continues to promote ease of use. To this end SmartSim has introduced new portability features that allow users to abstract away their targeted hardware, while providing even more compatibility with existing libraries.
A new feature, Co-located orchestrator deployments has been added which provides scalable online inference capabilities that overcome previous performance limitations in seperated orchestrator/application deployments. For more information on advantages of co-located deployments, see the Orchestrator section of the SmartSim documentation.
The SmartSim build was significantly improved to increase
customization of build toolchain and the smart
command
line inferface was expanded.
Additional tweaks and upgrades have also been made to ensure an optimal experience. Here is a comprehensive list of changes made in SmartSim 0.4.0.
Orchestrator Enhancements:
- Add Orchestrator Co-location (SmartSim-PR139)
- Add Orchestrator configuration file edit methods (SmartSim-PR109)
Emphasize Driver Script Portability:
- Add ability to create run settings through an experiment (SmartSim-PR110)
- Add ability to create batch settings through an experiment (SmartSim-PR112)
- Add automatic launcher detection to experiment portability functions (SmartSim-PR120)
Expand Machine Learning Library Support:
- Data loaders for online training in Keras/TF and Pytorch (SmartSim-PR115) (SmartSim-PR140)
- ML backend versions updated with expanded support for multiple versions (SmartSim-PR122)
- Launch Ray internally using
RunSettings
(SmartSim-PR118)- Add Ray cluster setup and deployment to SmartSim (SmartSim-PR50)
Expand Launcher Setting Options:
- Add ability to use base
RunSettings
on a Slurm, PBS, or Cobalt launchers (SmartSim-PR90)- Add ability to use base
RunSettings
on LFS launcher (SmartSim-PR108)
Deprecations and Breaking Changes
- Orchestrator classes combined into single implementation for portability (SmartSim-PR139)
smartsim.constants
changed tosmartsim.status
(SmartSim-PR122)smartsim.tf
migrated tosmartsim.ml.tf
(SmartSim-PR115) (SmartSim-PR140)- TOML configuration option removed in favor of environment variable approach (SmartSim-PR122)
General Improvements and Bug Fixes:
- Improve and extend parameter handling (SmartSim-PR107) (SmartSim-PR119)
- Abstract away non-user facing implementation details (SmartSim-PR122)
- Add various dimensions to the CI build matrix for SmartSim testing (SmartSim-PR130)
- Add missing functions to LSFSettings API (SmartSim-PR113)
- Add RedisAI checker for installed backends (SmartSim-PR137)
- Remove heavy and unnecessary dependencies (SmartSim-PR116) (SmartSim-PR132)
- Fix LSFLauncher and LSFOrchestrator (SmartSim-PR86)
- Fix over greedy Workload Manager Parsers (SmartSim-PR95)
- Fix Slurm handling of comma-separated env vars (SmartSim-PR104)
- Fix internal method calls (SmartSim-PR138)
Documentation Updates:
- Updates to documentation build process (SmartSim-PR133) (SmartSim-PR143)
- Updates to documentation content (SmartSim-PR96) (SmartSim-PR129) (SmartSim-PR136) (SmartSim-PR141)
- Update SmartSim Examples (SmartSim-PR68) (SmartSim-PR100)
Released on August 10, 2021
Description:
- Upgraded RedisAI backend to 1.2.3 (SmartSim-PR69)
- PyTorch 1.7.1, TF 2.4.2, and ONNX 1.6-7 (SmartSim-PR69)
- LSF launcher for IBM machines (SmartSim-PR62)
- Improved code coverage by adding more unit tests (SmartSim-PR53)
- Orchestrator methods to get address and check status (SmartSim-PR60)
- Added Manifest object that tracks deployables in Experiments (SmartSim-PR61)
- Bug fixes (SmartSim-PR52) (SmartSim-PR58) (SmartSim-PR67) (SmartSim-PR73)
- Updated documentation and examples (SmartSim-PR51) (SmartSim-PR57) (SmartSim-PR71)
- Improved IP address aquisition (SmartSim-PR72)
- Binding database to network interfaces
Released on May 5, 2021
Description:
This release was dedicated to making the install process
easier. SmartSim can be installed from PyPI now and the
smart
cli tool makes installing the machine learning
runtimes much easier.
- Pip install (SmartSim-PR42)
smart
cli tool for ML backends (SmartSim-PR42)- Build Documentation for updated install (SmartSim-PR43)
- Migrate from Jenkins to Github Actions CI (SmartSim-PR42)
- Bug fix for setup.cfg (SmartSim-PR35)
Released on April 1, 2021
Description:
- initial 0.3.0 (first public) release of SmartSim