Skip to content

Conversation

@gtenev
Copy link
Contributor

@gtenev gtenev commented Jul 14, 2020

Fixed problems with initialization of cache volumes when at least
one volume is being forced to a specific "exclusive" span.

Problem description:

Disks are cleared in the following configuration where volume sizes are
specified using percentages and also one of the volumes is forced to a
specific span (disk):

storage.config:
  /dev/disk1
  /dev/disk2 volume=3 # <- exclusive span forced to a specific volume

volume.config:
  volume=1 scheme=http size=50%
  volume=2 scheme=http size=50%
  volume=3 scheme=http size=512 # <- volume forced to an exclusive span

During the first start ATS identifies the clears disks and does the following:

  1. creates and spreads new volume 1 and 2 blocks across disk1 and disk2
  2. deletes all volume 1 and 2 blocks from disk2 to make space for volume 3
  3. creates new volume 3 that takes over the whole disk2.

In step (1) volumes are caclulated larger and spread to disk2 only to be
deleted in step (2) to make space for the forced volume 3.

During the initial start the global volume list cp_list would end up
containing "zombie" CacheVol instances which corespond to the volume 1
and 2 blocks deleted from disk2 to make space for the volume 3 and the
mapping of domains to volumes (hosting.config) could end up mapping
to any of the deleted volume blocks.

This problem disappears after restart since cp_list will be initialized
from the disks and cp_list will contain only valid CacheVol instances.

The fix:

This fix prevents this from happening by making sure all volumes meant
to have "exclusive" disks are created first to make sure span free
spaces are updated correctly and by excluding the size of
the "exclusive" disks from the total cache size used for volume size
calculations when sizes are specified in percentages (volume.config).

(cherry picked from commit 17ee97a)

Fixed problems with initialization of cache volumes when at least
one volume is being forced to a specific "exclusive" span.

Problem description:
====================
Disks are cleared in the following configuration where volume sizes are
specified using percentages and also one of the volumes is forced to a
specific span (disk):

storage.config:
  /dev/disk1
  /dev/disk2 volume=3 # <- exclusive span forced to a specific volume

volume.config:
  volume=1 scheme=http size=50%
  volume=2 scheme=http size=50%
  volume=3 scheme=http size=512 # <- volume forced to an exclusive span

During the first start ATS identifies the clears disks and does the following:
1. creates and spreads new volume 1 and 2 blocks across disk1 and disk2
2. deletes all volume 1 and 2 blocks from disk2 to make space for volume 3
3. creates new volume 3 that takes over the whole disk2.

In step (1) volumes are caclulated larger and spread to disk2 only to be
deleted in step (2) to make space for the forced volume 3.

During the initial start the global volume list cp_list would end up
containing "zombie" CacheVol instances which corespond to the volume 1
and 2 blocks deleted from disk2 to make space for the volume 3 and the
mapping of domains to volumes (hosting.config) could end up mapping
to any of the deleted volume blocks.

This problem disappears after restart since cp_list will be initialized
from the disks and cp_list will contain only valid CacheVol instances.

The fix:
========
This fix prevents this from happening by making sure all volumes meant
to have "exclusive" disks are created first to make sure span free
spaces are updated correctly and by excluding the size of
the "exclusive" disks from the total cache size used for volume size
calculations when sizes are specified in percentages (volume.config).

(cherry picked from commit 17ee97a)
@gtenev gtenev added Core Cache Backport Marked for backport for an LTS patch release Bug labels Jul 14, 2020
@gtenev gtenev added this to the 8.1.0 milestone Jul 14, 2020
@gtenev gtenev requested review from SolidWallOfCode and scw00 July 14, 2020 21:07
@gtenev gtenev self-assigned this Jul 14, 2020
@zwoop zwoop merged commit 0512b07 into apache:8.1.x Jul 15, 2020
@zwoop zwoop modified the milestones: 8.1.0, Backported Jul 15, 2020
masaori335 pushed a commit to masaori335/trafficserver that referenced this pull request Mar 31, 2021
* asf/8.1.x:
  Updated Changes
  Fix volume/stripe calcs when using forced volumes (apache#6995) (apache#7001)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backport Marked for backport for an LTS patch release Bug Cache Core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants