Skip to content

Release v0.6.0 #655

Closed
Closed
@danielvegamyhre

Description

Release Checklist

  • All OWNERS must LGTM the release proposal
  • Verify that the changelog in this issue is up-to-date
  • For major or minor releases (v$MAJ.$MIN.0), create a new release branch.
    • an OWNER creates a vanilla release branch with
      git branch release-$MAJ.$MIN main
    • An OWNER pushes the new release branch with
      git push release-$MAJ.$MIN
  • Update things like README, deployment templates, docs, configuration, test/e2e flags.
    Submit a PR against the release branch:
  • An OWNER prepares a draft release
    • Write the change log into the draft release.
    • Run
      make artifacts IMAGE_REGISTRY=registry.k8s.io/jobset GIT_TAG=$VERSION
      to generate the artifacts and upload the files in the artifacts folder
      to the draft release.
  • An OWNER creates a signed tag running
    git tag -s $VERSION
    and inserts the changelog into the tag description.
    To perform this step, you need a PGP key registered on github.
  • An OWNER pushes the tag with
    git push $VERSION
    • Triggers prow to build and publish a staging container image
      gcr.io/k8s-staging-jobset/jobset:$VERSION
  • Submit a PR against k8s.io,
    updating k8s.gcr.io/images/k8s-staging-jobset/images.yaml to
    promote the container images
    to production:
  • Wait for the PR to be merged and verify that the image registry.k8s.io/jobset/jobset:$VERSION is available.
  • Publish the draft release prepared at the Github releases page.
  • Add a link to the tagged release in this issue:
  • Send an announcement email to sig-apps@kubernetes.io, sig-scheduling@kubernetes.io and wg-batch@kubernetes.io with the subject [ANNOUNCE] JobSet $VERSION is released
  • Add a link to the release announcement in this issue:
  • For a major or minor release, update README.md and docs/setup/install.md
    in main branch:
  • For a major or minor release, create an unannotated devel tag in the
    main branch, on the first commit that gets merged after the release
    branch has been created (presumably the README update commit above), and, push the tag:
    DEVEL=v0.$(($MAJ+1)).0-devel; git tag $DEVEL main && git push $DEVEL
    This ensures that the devel builds on the main branch will have a meaningful version number.
  • Close this issue

Changelog

Highlights

  • New JobSet Failure Policy API - allows users to configure different behavior for different types of errors, enabling them to use compute resources more efficiently and improve ML training goodput.
  • Add Coordinator field to JobSet spec, enabling user to define a global coordinator pod for distributed ML/HPC workloads. The stable network endpoint for this pod will be added as a label and annotation to every Job and Pod in the JobSet for easy use in application code. A common use case for this is TPU Multislice training with multiple different Job templates. See linked issue for details.
  • Add global Job index label/annotation to every Job and Pod, which is needed to support TPU Multislice training with multiple different Job templates. See linked issue for details.
  • Added new metrics
  • Improved test coverage
  • Bug fixes
  • New examples and documentation

What's Changed

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions