Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 52 additions & 6 deletions docs/how-to/enable-service-instances.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ Cluster designs that call for extra service instances, however, can be
satisfied by manual means. In addition to the above-listed services, the
following service can be added manually to a node:

* NFS
* RGW (`RADOS Gateway service`_)
* NFS (supports grouped service model with ``--cluster-id``)
* RGW (`RADOS Gateway service`_) (supports grouped service model with ``--group-id``)
* cephfs-mirror

This is the purpose of the :command:`enable` command. It manually enables a
Expand All @@ -29,9 +29,10 @@ The syntax is:

sudo microceph enable <service> --target <destination> ...

Where the service value is one of 'mon', 'mds', 'mgr', 'nfs-<cluster-id>' and
'rgw'. The destination is a node name as discerned by the output of the
:command:`status` command:
Where the service value is one of 'mon', 'mds', 'mgr', 'nfs', and 'rgw'.
Services like NFS and RGW support the grouped service model, allowing multiple
instances to be managed as a logical group. The destination is a node name as
discerned by the output of the :command:`status` command:

.. code-block:: none

Expand Down Expand Up @@ -72,8 +73,10 @@ View any possible extra parameters for the RGW service:

sudo microceph enable rgw --help

**Option 1: Enable RGW service (ungrouped, legacy mode)**

To enable the RGW service on node1 and specify a value for extra parameter
`port`:
`port` without using the grouped service model:

.. code-block:: none

Expand All @@ -93,6 +96,49 @@ Finally, view cluster status again and verify expected changes:
Services: mds, mgr, mon
Disks: 0

**Option 2: Enable RGW service with grouped model**

To enable the RGW service on node1 using the grouped service model with a
specific group ID:

.. code-block:: none

sudo microceph enable rgw --target node1 --port 8080 --group-id my-rgw-cluster

View cluster status to see the grouped RGW service:

.. code-block:: none

sudo microceph status

MicroCeph deployment summary:
- node1 (10.111.153.78)
Services: mds, mgr, mon, rgw.my-rgw-cluster, osd
Disks: 3
- workbook (192.168.29.152)
Services: mds, mgr, mon
Disks: 0

.. note::

Enabling RGW on multiple nodes with the same ``--group-id`` will
effectively result in the running RGW services being grouped in the same
service cluster. This follows the same pattern as the NFS service.

.. caution::

A node may only run one RGW service at a time, either grouped or ungrouped.
To switch between modes or join a different group, you must first disable
the existing RGW service:

.. code-block:: none

# For ungrouped RGW
sudo microceph disable rgw --target node1

# For grouped RGW
sudo microceph disable rgw --group-id my-rgw-cluster --target node1

Enable an NFS service
---------------------

Expand Down
1 change: 1 addition & 0 deletions docs/how-to/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ migrate services and more.

change-log-level
migrate-auto-services
migrate-rgw-to-grouped
remove-disk
perform-cluster-maintenance
Enable full disk encryption <enable-fde>
Expand Down
167 changes: 167 additions & 0 deletions docs/how-to/migrate-rgw-to-grouped.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
==========================================
Migrating RGW to grouped service model
==========================================

MicroCeph now supports a grouped service model for RGW (RADOS Gateway) services,
similar to the NFS service. This allows multiple RGW instances to be logically
grouped and managed together, providing better organization and service
management capabilities.

Background
----------

Previously, RGW services in MicroCeph were deployed as ungrouped services,
meaning each RGW instance was tracked independently without any logical grouping.
The new grouped service model introduces the concept of a ``group-id``, allowing
multiple RGW instances to be part of the same service group.

Benefits of grouped RGW services:

* **Logical grouping**: Multiple RGW instances can be identified as part of the same service cluster
* **Better organization**: Service groups appear with descriptive names (e.g., ``rgw.my-cluster``)
* **Consistent model**: Aligns with the NFS grouped service pattern
* **Future extensibility**: Enables group-wide configuration and management capabilities

Migration process
-----------------

.. note::

There is no automatic migration from ungrouped to grouped RGW services.
Migration must be performed manually.

The migration process involves disabling the existing ungrouped RGW service and
re-enabling it with a group ID. This requires a brief service interruption.

**Prerequisites:**

* Ensure you have a maintenance window as the RGW service will be temporarily unavailable
* Note down the current RGW configuration (port, SSL settings, etc.)
* Identify which nodes have RGW services running

Step 1: Check current RGW services
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View your current cluster status to identify nodes with RGW services:

.. code-block:: none

sudo microceph status

MicroCeph deployment summary:
- node1 (10.111.153.78)
Services: mds, mgr, mon, rgw, osd
Disks: 3
- node2 (192.168.29.152)
Services: mds, mgr, mon, rgw
Disks: 0

In this example, both ``node1`` and ``node2`` have ungrouped RGW services.

Step 2: Disable the ungrouped RGW service
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Disable the existing ungrouped RGW service on each node:

.. code-block:: none

# On node1
sudo microceph disable rgw --target node1

# On node2
sudo microceph disable rgw --target node2

.. caution::

The RGW service will be unavailable during this step. Ensure clients are
prepared for the interruption.

Step 3: Enable RGW with group ID
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Re-enable the RGW service with a group ID. Use the same configuration parameters
(port, SSL settings) as before:

.. code-block:: none

# On node1
sudo microceph enable rgw --target node1 --port 8080 --group-id main-gateway

# On node2
sudo microceph enable rgw --target node2 --port 8080 --group-id main-gateway

.. note::

The ``--group-id`` must match the pattern: start and end with alphanumeric
characters, may contain alphanumeric characters, dots, hyphens, and
underscores in the middle, and be 3-63 characters long total (e.g.,
``main-gateway``, ``rgw-cluster-1``, ``my_rgw.cluster``).

Step 4: Verify the migration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Check the cluster status to verify the grouped RGW services are running:

.. code-block:: none

sudo microceph status

MicroCeph deployment summary:
- node1 (10.111.153.78)
Services: mds, mgr, mon, rgw.main-gateway, osd
Disks: 3
- node2 (192.168.29.152)
Services: mds, mgr, mon, rgw.main-gateway
Disks: 0

Notice that the RGW services now appear as ``rgw.main-gateway``, indicating
they are part of the same service group.

Managing grouped RGW services
------------------------------

Once migrated to the grouped model, you must always include the ``--group-id``
parameter when disabling RGW services:

.. code-block:: none

# Disable grouped RGW
sudo microceph disable rgw --group-id main-gateway --target node1

# This will NOT work for grouped services (only for ungrouped)
sudo microceph disable rgw --target node1

Backward compatibility
----------------------

MicroCeph maintains backward compatibility with ungrouped RGW services. You can:

* Continue to use ungrouped RGW services without migration
* Deploy new ungrouped RGW services by omitting the ``--group-id`` flag
* Mix ungrouped and grouped RGW services in the same cluster (on different nodes)

However, a single node can only run one RGW service at a time, either grouped
or ungrouped.

Troubleshooting
---------------

**Issue**: Cannot enable grouped RGW service, error about existing service

**Solution**: Ensure the node doesn't already have an RGW service (grouped or
ungrouped). Disable the existing service first.

**Issue**: Group ID validation error

**Solution**: Ensure your ``--group-id`` follows the required pattern: must
start and end with alphanumeric characters, may contain alphanumeric characters,
dots, hyphens, and underscores in the middle, and be 3-63 characters long total.

**Issue**: Services not showing as grouped in status

**Solution**: Verify that you used the same ``--group-id`` on all nodes that
should be part of the group.

.. LINKS

.. _RADOS Gateway service: https://docs.ceph.com/en/latest/radosgw/
33 changes: 27 additions & 6 deletions microceph/api/services.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,19 @@ package api
import (
"encoding/json"
"fmt"
"io"
"net/http"
"path"

"github.com/canonical/microceph/microceph/database"
"github.com/canonical/microceph/microceph/interfaces"

"github.com/canonical/lxd/lxd/response"
"github.com/canonical/microceph/microceph/logger"
"github.com/canonical/microceph/microceph/api/types"
"github.com/canonical/microcluster/v2/rest"
"github.com/canonical/microcluster/v2/state"

"github.com/canonical/microceph/microceph/api/types"
"github.com/canonical/microceph/microceph/ceph"
"github.com/canonical/microceph/microceph/database"
"github.com/canonical/microceph/microceph/interfaces"
"github.com/canonical/microceph/microceph/logger"
)

// /1.0/services endpoint.
Expand Down Expand Up @@ -204,7 +204,28 @@ func cmdNFSDeleteService(s state.State, r *http.Request) response.Response {
}

func cmdRGWServiceDelete(s state.State, r *http.Request) response.Response {
err := ceph.DisableRGW(r.Context(), interfaces.CephState{State: s})
var svc types.RGWService
var groupID string

// Try to decode JSON body - if it's empty (EOF), treat as ungrouped service for backward compatibility
if r.Body != nil {
err := json.NewDecoder(r.Body).Decode(&svc)
if err != nil && err != io.EOF {
logger.Errorf("failed decoding disable service request: %v", err)
return response.InternalError(err)
}
groupID = svc.GroupID
}

// Validate GroupID if provided
if groupID != "" {
if !types.NFSClusterIDRegex.MatchString(groupID) {
err := fmt.Errorf("expected group_id to be valid (regex: '%s')", types.NFSClusterIDRegex.String())
return response.SmartError(err)
}
}

err := ceph.DisableRGW(r.Context(), interfaces.CephState{State: s}, groupID)
if err != nil {
logger.Errorf("Failed disabling RGW: %v", err)
return response.SmartError(err)
Expand Down
5 changes: 3 additions & 2 deletions microceph/api/types/services.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,9 @@ var NFSClusterIDRegex = regexp.MustCompile(`^[\w][\w.-]{1,61}[\w]$`)
// RGWService holds a port number and enable/disable flag
type RGWService struct {
Service
Port int `json:"port" yaml:"port"`
Enabled bool `json:"enabled" yaml:"enabled"`
Port int `json:"port" yaml:"port"`
Enabled bool `json:"enabled" yaml:"enabled"`
GroupID string `json:"group_id" yaml:"group_id"`
}

// MonitorStatus holds the status of all monitors
Expand Down
33 changes: 26 additions & 7 deletions microceph/ceph/rgw.go
Original file line number Diff line number Diff line change
Expand Up @@ -80,28 +80,47 @@ func EnableRGW(s interfaces.StateInterface, port int, sslPort int, sslCertificat
}

// DisableRGW disables the RGW service on the cluster.
func DisableRGW(ctx context.Context, s interfaces.StateInterface) error {
// If groupID is provided, it removes the grouped service; otherwise, it removes the ungrouped service.
func DisableRGW(ctx context.Context, s interfaces.StateInterface, groupID string) error {
pathConsts := constants.GetPathConst()

// If GroupID is provided, check if the grouped service exists
if groupID != "" {
exists, err := database.GroupedServicesQuery.ExistsOnHost(ctx, s, "rgw", groupID)
if err != nil {
return fmt.Errorf("failed to verify the node's RGW service GroupID: %w", err)
} else if !exists {
return fmt.Errorf("RGW service with GroupID '%s' not found on node '%s'", groupID, s.ClusterState().Name())
}
}

err := stopRGW()
if err != nil {
return fmt.Errorf("Failed to stop RGW service: %w", err)
}

err = removeServiceDatabase(ctx, s, "rgw")
if err != nil {
return err
// Remove database records based on service type
if groupID != "" {
err = database.GroupedServicesQuery.RemoveForHost(ctx, s, "rgw", groupID)
if err != nil {
return err
}
} else {
err = removeServiceDatabase(ctx, s, "rgw")
if err != nil {
return err
}
}

// Remove the keyring symlink.
err = os.Remove(filepath.Join(pathConsts.ConfPath, "ceph.client.radosgw.gateway.keyring"))
if err != nil {
if err != nil && !os.IsNotExist(err) {
return fmt.Errorf("failed to remove RGW keyring symlink: %w", err)
}

// Remove the keyring.
err = os.Remove(filepath.Join(pathConsts.DataPath, "radosgw", "ceph-radosgw.gateway", "keyring"))
if err != nil {
if err != nil && !os.IsNotExist(err) {
return fmt.Errorf("failed to remove RGW keyring: %w", err)
}

Expand All @@ -117,7 +136,7 @@ func DisableRGW(ctx context.Context, s interfaces.StateInterface) error {

// Remove the configuration.
err = os.Remove(filepath.Join(pathConsts.ConfPath, "radosgw.conf"))
if err != nil {
if err != nil && !os.IsNotExist(err) {
return fmt.Errorf("failed to remove RGW configuration: %w", err)
}

Expand Down
Loading
Loading