Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions nexus-config/src/nexus_config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,9 @@ pub struct TimeseriesDbConfig {
/// The native TCP address of the ClickHouse server.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub address: Option<SocketAddr>,
/// The max size of the connection pool.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub max_slots: Option<usize>,
}

/// Configuration for the `Dendrite` dataplane daemon.
Expand Down
21 changes: 14 additions & 7 deletions nexus/src/app/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ use omicron_common::api::external::Error;
use omicron_common::api::internal::shared::SwitchLocation;
use omicron_uuid_kinds::OmicronZoneUuid;
use oximeter_producer::Server as ProducerServer;
use qorb::policy::Policy;
use qorb::resolvers::fixed::FixedResolver;
use sagas::common_storage::PooledPantryClient;
use sagas::common_storage::make_pantry_connection_pool;
use slog::Logger;
Expand Down Expand Up @@ -405,14 +407,19 @@ impl Nexus {
.map_err(|e| e.to_string())?;

// Client to the ClickHouse database.
let timeseries_client = match &config.pkg.timeseries_db.address {
None => {
let native_resolver =
qorb_resolver.for_service(ServiceName::OximeterReader);
oximeter_db::Client::new_with_resolver(native_resolver, &log)
}
Some(address) => oximeter_db::Client::new(*address, &log),
let timeseries_resolver = match &config.pkg.timeseries_db.address {
Some(address) => Box::new(FixedResolver::new([*address])),
None => qorb_resolver.for_service(ServiceName::OximeterReader),
};
let mut timeseries_policy = Policy::default();
if let Some(max_slots) = config.pkg.timeseries_db.max_slots {
timeseries_policy.max_slots = max_slots;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the maximum across all backends. Is that what you want to cap? Or a maximum for each backend?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I was thinking. I think I have queries queueing up when I run the otel receiver against oximeter, and I wanted to see if increasing the connection pool size might help. I don't have a good mental model of qorb—are there multiple backends when we're managing a connection pool for a single database instance, as we are here? For my particular use case, I'm interested in bumping whichever cap is throttling my queries, but maybe we should add knobs for both caps for generality.

As an aside, is there a simple way to check whether we're saturating either the policy or backend's max connections?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just recently switched back to single-node ClickHouse, in which case there is one backend and so the total cap on slots and the count per backend are the same. When we go back to multinode, we probably want to configure this on a per-backend basis.

As an aside, is there a simple way to check whether we're saturating either the policy or backend's max connections?

I'd probably use the USDT probes to do this. For example, if there is substantial time between claim-start and claim-done, then the connections are all in use since we're spending time queued. You could also use handle-claimed and handle-returned to estimate the spare capacity in the pool over time.

}
let timeseries_client = oximeter_db::Client::new_with_pool_policy(
timeseries_resolver,
timeseries_policy,
&log,
);

// TODO-cleanup We may want to make the populator a first-class
// background task.
Expand Down
Loading