Skip to content

Commit

Permalink
add --drain-time-s and --parent-shutdown-time-s CLI options (#195)
Browse files Browse the repository at this point in the history
Allows configurable drain and shutdown times during hot restart.
  • Loading branch information
mattklein123 authored Nov 4, 2016
1 parent 5a7e1f9 commit 6ddb28c
Show file tree
Hide file tree
Showing 11 changed files with 70 additions and 14 deletions.
6 changes: 4 additions & 2 deletions docs/intro/arch_overview/hot_restart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,10 @@ hot restart functionality has has the following general architecture:
the old process. The new process starts listening and then tells the old process to start
draining.
* During the draining phase, the old process attempts to gracefully close existing connections. How
this is done depends on the configured filters. The drain time is configurable and as more time
passes draining becomes more aggressive.
this is done depends on the configured filters. The drain time is configurable via the
:option:`--drain-time-s` option and as more time passes draining becomes more aggressive.
* After drain sequence, the new Envoy process tells the old Envoy process to shut itself down.
This time is configurable via the :option:`--parent-shutdown-time-s` option.
* Envoy’s hot restart support was designed so that it will work correctly even if the new Envoy
process and the old Envoy process are running inside different containers. Communication between
the processes takes place only using unix domain sockets.
Expand Down
16 changes: 16 additions & 0 deletions docs/operations/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,3 +70,19 @@ following are the command line options that Envoy supports.
the interval has elapsed, whichever comes first. Adjusting this setting is useful
when tailing :ref:`access logs <arch_overview_http_access_logs>` in order to
get more (or less) immediate flushing.

.. option:: --drain-time-s <integer>

*(optional)* The time in seconds that Envoy will drain connections during a hot restart. See the
:ref:`hot restart overview <arch_overview_hot_restart>` for more information. Defaults to 600
seconds (10 minutes). Generally the drain time should be less than the parent shutdown time
set via the :option:`--parent-shutdown-time-s` option. How the two settings are configured
depends on the specific deployment. In edge scenarios, it might be desirable to have a very long
drain time. In service to service scenarios, it might be possible to make the drain and shutdown
time much shorter (e.g., 60s/90s).

.. option:: --parent-shutdown-time-s <integer>

*(optional)* The time in seconds that Envoy will wait before shutting down the parent process
during a hot restart. See the :ref:`hot restart overview <arch_overview_hot_restart>` for more
information. Defaults to 900 seconds (15 minutes).
11 changes: 11 additions & 0 deletions include/envoy/server/options.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@ class Options {
*/
virtual uint32_t concurrency() PURE;

/**
* @return the number of seconds that envoy will perform draining during a hot restart.
*/
virtual std::chrono::seconds drainTime() PURE;

/**
* @return const std::string& the path to the configuration file.
*/
Expand All @@ -34,6 +39,12 @@ class Options {
*/
virtual spdlog::level::level_enum logLevel() PURE;

/**
* @return the number of seconds that envoy will wait before shutting down the parent envoy during
* a host restart. Generally this will be longer than the drainTime() option.
*/
virtual std::chrono::seconds parentShutdownTime() PURE;

/**
* @return the restart epoch. 0 indicates the first server start, 1 the second, and so on.
*/
Expand Down
13 changes: 5 additions & 8 deletions source/server/drain_manager_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,6 @@

namespace Server {

const std::chrono::minutes DrainManagerImpl::DEFAULT_DRAIN_TIME{10};
const std::chrono::minutes DrainManagerImpl::DEFAULT_PARENT_SHUTDOWN_TIME{15};

DrainManagerImpl::DrainManagerImpl(Instance& server) : server_(server) {}

bool DrainManagerImpl::drainClose() {
Expand All @@ -26,15 +23,15 @@ bool DrainManagerImpl::drainClose() {

// We use the tick time as in increasing chance that we shutdown connections.
return static_cast<uint64_t>(drain_time_completed_.count()) >
(server_.random().random() % DEFAULT_DRAIN_TIME.count());
(server_.random().random() % server_.options().drainTime().count());
}

void DrainManagerImpl::drainSequenceTick() {
log_trace("drain tick #{}", drain_time_completed_.count());
ASSERT(drain_time_completed_ < DEFAULT_DRAIN_TIME);
ASSERT(drain_time_completed_ < server_.options().drainTime());
drain_time_completed_ += std::chrono::seconds(1);

if (drain_time_completed_ < DEFAULT_DRAIN_TIME) {
if (drain_time_completed_ < server_.options().drainTime()) {
drain_tick_timer_->enableTimer(std::chrono::milliseconds(1000));
}
}
Expand All @@ -53,8 +50,8 @@ void DrainManagerImpl::startParentShutdownSequence() {
server_.hotRestart().terminateParent();
});

parent_shutdown_timer_->enableTimer(
std::chrono::duration_cast<std::chrono::milliseconds>(DEFAULT_PARENT_SHUTDOWN_TIME));
parent_shutdown_timer_->enableTimer(std::chrono::duration_cast<std::chrono::milliseconds>(
server_.options().parentShutdownTime()));
}

} // Server
3 changes: 0 additions & 3 deletions source/server/drain_manager_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,6 @@ class DrainManagerImpl : Logger::Loggable<Logger::Id::main>, public DrainManager
private:
void drainSequenceTick();

static const std::chrono::minutes DEFAULT_DRAIN_TIME;
static const std::chrono::minutes DEFAULT_PARENT_SHUTDOWN_TIME;

Instance& server_;
Event::TimerPtr drain_tick_timer_;
std::chrono::seconds drain_time_completed_{};
Expand Down
7 changes: 7 additions & 0 deletions source/server/options_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,11 @@ OptionsImpl::OptionsImpl(int argc, char** argv, const std::string& hot_restart_v
TCLAP::ValueArg<uint64_t> file_flush_interval_msec("", "file-flush-interval-msec",
"Interval for log flushing in msec", false,
10000, "uint64_t", cmd);
TCLAP::ValueArg<uint64_t> drain_time_s("", "drain-time-s", "Hot restart drain time in seconds",
false, 600, "uint64_t", cmd);
TCLAP::ValueArg<uint64_t> parent_shutdown_time_s("", "parent-shutdown-time-s",
"Hot restart parent shutdown time in seconds",
false, 900, "uint64_t", cmd);

try {
cmd.parse(argc, argv);
Expand Down Expand Up @@ -68,4 +73,6 @@ OptionsImpl::OptionsImpl(int argc, char** argv, const std::string& hot_restart_v
service_node_ = service_node.getValue();
service_zone_ = service_zone.getValue();
file_flush_interval_msec_ = std::chrono::milliseconds(file_flush_interval_msec.getValue());
drain_time_ = std::chrono::seconds(drain_time_s.getValue());
parent_shutdown_time_ = std::chrono::seconds(parent_shutdown_time_s.getValue());
}
4 changes: 4 additions & 0 deletions source/server/options_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@ class OptionsImpl : public Server::Options {
uint64_t baseId() { return base_id_; }
uint32_t concurrency() override { return concurrency_; }
const std::string& configPath() override { return config_path_; }
std::chrono::seconds drainTime() override { return drain_time_; }
spdlog::level::level_enum logLevel() override { return log_level_; }
std::chrono::seconds parentShutdownTime() override { return parent_shutdown_time_; }
uint64_t restartEpoch() override { return restart_epoch_; }
const std::string& serviceClusterName() override { return service_cluster_; }
const std::string& serviceNodeName() override { return service_node_; }
Expand All @@ -31,4 +33,6 @@ class OptionsImpl : public Server::Options {
std::string service_node_;
std::string service_zone_;
std::chrono::milliseconds file_flush_interval_msec_;
std::chrono::seconds drain_time_;
std::chrono::seconds parent_shutdown_time_;
};
2 changes: 2 additions & 0 deletions test/integration/server.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@ class TestOptionsImpl : public Options {
uint64_t baseId() override { return 0; }
uint32_t concurrency() override { return 1; }
const std::string& configPath() override { return config_path_; }
std::chrono::seconds drainTime() override { return std::chrono::seconds(0); }
spdlog::level::level_enum logLevel() override { NOT_IMPLEMENTED; }
std::chrono::seconds parentShutdownTime() override { return std::chrono::seconds(0); }
uint64_t restartEpoch() override { return 0; }
const std::string& serviceClusterName() override { return cluster_name_; }
const std::string& serviceNodeName() override { return node_name_; }
Expand Down
2 changes: 2 additions & 0 deletions test/mocks/server/mocks.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,9 @@ class MockOptions : public Options {
MOCK_METHOD0(baseId, uint64_t());
MOCK_METHOD0(concurrency, uint32_t());
MOCK_METHOD0(configPath, const std::string&());
MOCK_METHOD0(drainTime, std::chrono::seconds());
MOCK_METHOD0(logLevel, spdlog::level::level_enum());
MOCK_METHOD0(parentShutdownTime, std::chrono::seconds());
MOCK_METHOD0(restartEpoch, uint64_t());
MOCK_METHOD0(serviceClusterName, const std::string&());
MOCK_METHOD0(serviceNodeName, const std::string&());
Expand Down
4 changes: 3 additions & 1 deletion test/server/drain_manager_impl_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,13 @@ namespace Server {

TEST(DrainManagerImplTest, All) {
NiceMock<MockInstance> server;
ON_CALL(server.options_, drainTime()).WillByDefault(Return(std::chrono::seconds(600)));
ON_CALL(server.options_, parentShutdownTime()).WillByDefault(Return(std::chrono::seconds(900)));
DrainManagerImpl drain_manager(server);

// Test parent shutdown.
Event::MockTimer* shutdown_timer = new Event::MockTimer(&server.dispatcher_);
EXPECT_CALL(*shutdown_timer, enableTimer(_));
EXPECT_CALL(*shutdown_timer, enableTimer(std::chrono::milliseconds(900000)));
drain_manager.startParentShutdownSequence();

EXPECT_CALL(server.hot_restart_, terminateParent());
Expand Down
16 changes: 16 additions & 0 deletions test/server/options_impl_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ TEST(OptionsImplTest, All) {
argv.push_back("zone");
argv.push_back("--file-flush-interval-msec");
argv.push_back("9000");
argv.push_back("--drain-time-s");
argv.push_back("60");
argv.push_back("--parent-shutdown-time-s");
argv.push_back("90");
OptionsImpl options(argv.size(), const_cast<char**>(&argv[0]), "1", spdlog::level::warn);
EXPECT_EQ(2U, options.concurrency());
EXPECT_EQ("hello", options.configPath());
Expand All @@ -28,4 +32,16 @@ TEST(OptionsImplTest, All) {
EXPECT_EQ("node", options.serviceNodeName());
EXPECT_EQ("zone", options.serviceZone());
EXPECT_EQ(std::chrono::milliseconds(9000), options.fileFlushIntervalMsec());
EXPECT_EQ(std::chrono::seconds(60), options.drainTime());
EXPECT_EQ(std::chrono::seconds(90), options.parentShutdownTime());
}

TEST(OptionsImplTest, DefaultParams) {
std::vector<const char*> argv;
argv.push_back("envoy");
argv.push_back("-c");
argv.push_back("hello");
OptionsImpl options(argv.size(), const_cast<char**>(&argv[0]), "1", spdlog::level::warn);
EXPECT_EQ(std::chrono::seconds(600), options.drainTime());
EXPECT_EQ(std::chrono::seconds(900), options.parentShutdownTime());
}

0 comments on commit 6ddb28c

Please sign in to comment.