Skip to content

some non-unique indexes in dbinit.sql are fishy #1497

Open
@davepacheco

Description

@davepacheco

We have a bunch of non-unique indexes in the database. I'm a little skeptical of these because:

  • They're not good for looking up a specific item in the table because the lookup would be linear in the number of duplicates in the index.
  • They're not good for pagination for the same reason. At best, if a page boundary lands in a run of duplicates, then the next page request will have to do a linear scan of the duplicated items because it has no way to quickly get to the one it needs to start with. At worst, the database will think it can't use the index at all because the items in it do not necessarily appear to satisfy the ORDER BY that's used for pagination.
  • In most cases, you can make these indexes good for both lookup and pagination by adding the primary key (usually id) as the last column in the index. (I believe that CockroachDB implicitly includes the primary key already and it's possible it uses this to address the above problems? But (a) if we're relying on that, we should verify it, and (b) if so, then we probably still want to be consistent. Right now, a bunch of our indexes explicitly include the id for this purpose and a bunch of indexes don't.)

There's at least one good use case for non-unique indexes, which is when you expect to fetch a bunch of rows based on some property, you expect there could be a large number of these, and you don't care which ones you get. Examples of this include the index on console_session.time_created that's intended to be used to expire old sessions. This is fine because we'll wind up querying for sessions older than, say, a day ago, and we don't care which ones we find. (In fact, we'll probably keep querying and expiring everything we find until we run out.)

We should take a pass through the non-unique indexes and audit them.

Examples that look intended for pagination or lookup but that I think won't work for the above reasons:

  • sled.rack_id (needs sled's id too and then it'll be unique)
  • service.sled_id (needs service's id too and then it'll be unique)
  • region.volume_id (needs region's id too and then it'll be unique)
  • disk.attach_instance_id (needs disk's id too and then it'll be unique)

These I think are wrong only because they have the fields in the wrong order:

  • identity_provider.(id, silo_id) -- but I think this probably needs to be (silo_id, id) because otherwise it can't be used to quickly list IdPs in a Silo
  • identity_provider.(name, silo_id) -- but I think this probably needs to be (silo_id, name) for the same reason
  • saml_identity_provider.(id, silo_id) -- same

These I think are unique, just not constrained to be:

  • metric_producer.(oximeter_id, id)

Ones I'm not sure about without more investigation:

  • Dataset.size_used
  • ip_pool_range.project_id
  • instance_external_ip.(ip_pool_id,ip_pool_range_id)
  • instance_external_ip.instance_id

Here are all of them:

/* Add an index which lets us look up sleds on a rack */
CREATE INDEX ON omicron.public.sled (
    rack_id
) WHERE time_deleted IS NULL;
/* Add an index which lets us look up the services on a sled */
CREATE INDEX ON omicron.public.service (
    sled_id
);
/* Create an index on the size usage for Crucible's allocation */
CREATE INDEX on omicron.public.Dataset (
    size_used
) WHERE size_used IS NOT NULL AND time_deleted IS NULL AND kind = 'crucible';
/*
 * Allow all regions belonging to a disk to be accessed quickly.
 */
CREATE INDEX on omicron.public.Region (
    volume_id
);
CREATE INDEX ON omicron.public.identity_provider (
    id,
    silo_id
) WHERE
    time_deleted IS NULL;

CREATE INDEX ON omicron.public.identity_provider (
    name,
    silo_id
) WHERE
    time_deleted IS NULL;
CREATE INDEX ON omicron.public.saml_identity_provider (
    id,
    silo_id
) WHERE
    time_deleted IS NULL;
CREATE INDEX ON omicron.public.disk (
    attach_instance_id
) WHERE
    time_deleted IS NULL AND attach_instance_id IS NOT NULL;
CREATE INDEX ON omicron.public.metric_producer (
    oximeter_id,
    id
);
/*
 * Index supporting allocation of IPs out of a Pool reserved for a project.
 */
CREATE INDEX ON omicron.public.ip_pool_range (
    project_id
) WHERE
    time_deleted IS NULL;
/*
 * Index used to support quickly looking up children of the IP Pool range table,
 * when checking for allocated addresses during deletion.
 */
CREATE INDEX ON omicron.public.instance_external_ip (
    ip_pool_id,
    ip_pool_range_id
)
    WHERE time_deleted IS NULL;
CREATE INDEX ON omicron.public.instance_external_ip (
    instance_id
)
    WHERE instance_id IS NOT NULL AND time_deleted IS NULL;
CREATE INDEX ON omicron.public.console_session (
    time_created
);
/* This index is used to quickly find outdated artifacts. */
CREATE INDEX ON omicron.public.update_available_artifact (
    targets_role_version
);

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions