start post-deployment

polyfractal · polyfractal · commit e3e67e3dcaf2 · 2014-08-28T23:27:45.000-04:00
diff --git a/510_Deployment.asciidoc b/510_Deployment.asciidoc
@@ -9,36 +9,9 @@ include::510_Deployment/30_other.asciidoc[]
 
 include::510_Deployment/40_config.asciidoc[]
 
+include::510_Deployment/45_dont_touch.asciidoc[]
+
 include::510_Deployment/50_heap.asciidoc[]
 
 include::510_Deployment/60_file_descriptors.asciidoc[]
 
-=== Post-Deployment
-
--Prereqs
-
-    - Hardware
-        -memory, cpu, etc
-        - disable swap on node and VM host
-        - ssd noop
-
-    - Talking to the cluster
-        - language clients
-        - TransportClient vs NodeClient
-        - Proxies, load balancers, etc
-    - JVM
-
-- Configs to change before production
-    - elasticsearch.yml
-    - cluster settings api
-        - changing logging dynamically
-    - don't touch these:
-        - GC, threadpools
-
-- index performance tips
-    - Mike's blog
-- security
-    - none lololol
-
--snapshot/restore
-    - fs must be accessible from all nodes
diff --git a/510_Deployment/45_dont_touch.asciidoc b/510_Deployment/45_dont_touch.asciidoc
@@ -68,17 +68,17 @@ will happen on the same core.  If you are unlucky, the switch may migrate to a
 different core and require transport on inter-core communication bus.
 
 This context switching eats up cycles simply doing administrative housekeeping
- -- estimates can peg it as high as 30us on modern CPUs.  So unless the thread
- will be blocked for longer than 30us, it is highly likely that that time would
- have been better spent just processing and finishing early.
+-- estimates can peg it as high as 30μs on modern CPUs.  So unless the thread
+will be blocked for longer than 30μs, it is highly likely that that time would
+have been better spent just processing and finishing early.
 
- People routinely set threadpools to silly values.  On 8 core machines, we have
- run across configs with 60, 100 or even 1000 threads.  These settings will simply
- thrash the CPU more than getting real work done.
+People routinely set threadpools to silly values.  On 8 core machines, we have
+run across configs with 60, 100 or even 1000 threads.  These settings will simply
+thrash the CPU more than getting real work done.
 
- So. Next time you want to tweak a threadpool...please don't.  And if you
- _absolutely cannot resist_, please keep your core count in mind and perhaps set
- the count to double.  More than that is just a waste.
+So. Next time you want to tweak a threadpool...please don't.  And if you
+_absolutely cannot resist_, please keep your core count in mind and perhaps set
+the count to double.  More than that is just a waste.
 
 
 
diff --git a/510_Deployment/80_cluster_settings.asciidoc b/510_Deployment/80_cluster_settings.asciidoc
@@ -0,0 +1,2 @@
+
+===
diff --git a/520_Post_Deployment.asciidoc b/520_Post_Deployment.asciidoc
@@ -0,0 +1,23 @@
+[[post_deploy]]
+== Post-Deployment
+
+Once you have deployed your cluster in production, there are some tools and
+best practices to keep your cluster running in top shape.  In this short
+section, we'll talk about configuring settings dynamically, how to tweak
+logging levels, indexing performance tips and how to backup your cluster.
+
+include::520_Post_Deployment/10_dynamic_settings.asciidoc[]
+
+include::520_Post_Deployment/20_logging.asciidoc[]
+
+
+
+- index performance tips
+    - Mike's blog
+- security
+    - none lololol
+
+-snapshot/restore
+    - fs must be accessible from all nodes
+
+rolling restarts 
diff --git a/520_Post_Deployment/10_dynamic_settings.asciidoc b/520_Post_Deployment/10_dynamic_settings.asciidoc
@@ -0,0 +1,37 @@
+
+=== Changing settings dynamically
+
+Many settings in Elasticsearch are dynamic, and modifiable through the API.
+Configuration changes that force a node (or cluster) restart are strenuously avoided.
+And while it's possible to make the changes through the static configs, we
+recommend that you use the API instead.
+
+The _Cluster Update_ API operates in two modes:
+
+- Transient: these changes are in effect until the cluster restarts.  Once
+a full cluster restart takes place, these settings are erased
+
+- Persistent: these changes are permanently in place unless explicitly changed.
+They will survive full cluster restarts and override the static configuration files.
+
+Transient vs Persistent settings are supplied in the JSON body:
+
+[source,js]
+----
+PUT /_cluster/settings
+{
+    "persistent" : {
+        "discovery.zen.minimum_master_nodes" : 2 <1>
+    },
+    "transient" : {
+        "indices.store.throttle.max_bytes_per_sec" : "50mb" <2>
+    }
+}
+----
+<1> This persistent setting will survive full cluster restarts
+<2> While this transient setting will be removed after the first full cluster 
+restart
+
+A complete list of settings that are dynamically updateable can be found in the
+http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-update-settings.html[online reference docs].
+
diff --git a/520_Post_Deployment/20_logging.asciidoc b/520_Post_Deployment/20_logging.asciidoc
@@ -0,0 +1,80 @@
+
+=== Logging
+
+Elasticsearch emits a number of logs, which by are placed in  `ES_HOME/logs`.  
+The default logging level is INFO.  It provides a moderate amount of information,
+but is designed to be rather light so that your logs are not enormous.
+
+When debugging problems, particularly problems with node discovery (since this
+often depends on finicky network configurations), it can be helpful to bump
+up the logging level to DEBUG.
+
+You _could_ modify the `logging.yml` file and restart your nodes...but that is 
+both tedious and leads to unnecessary downtime.  Instead, you can update logging
+levels through the Cluster Settings API that we just learned about.
+
+To do so, take the logger you are interested in and prepend `logger.` to it.
+Let's turn up the discovery logging:
+
+[source,js]
+----
+PUT /_cluster/settings
+{
+    "transient" : {
+        "logger.discovery" : "DEBUG"
+    }
+}
+---- 
+
+While this setting is in effect, Elasticsearch will begin to emit DEBUG-level
+logs for the `discovery` module.
+
+INFORMATION: Avoid TRACE, it is extremely verbose, to the point where the logs
+are no longer useful.
+
+==== Slowlog
+
+There is another log called the _Slowlog_.  The purpose of this log is to catch
+queries and indexing requests that take over a certain threshold of time.  
+It is useful for hunting down user-generated queries that are particularly slow.
+
+By default, the slowlog is not enabled.  It can be enabled by defining the action
+(query, fetch or index), the level that you want the event logged at (WARN, DEBUG,
+etc) and a time threshold.
+
+This is an index-level setting, which means it is applied to individual indices:
+
+[source,js]
+----
+PUT /my_index/_settings
+{
+    "index.search.slowlog.threshold.query.warn" : "10s", <1>
+    "index.search.slowlog.threshold.fetch.debug": 500ms", <2>
+    "index.indexing.slowlog.threshold.index.info": 5s" <3>
+}
+---- 
+<1> Emit a WARN log when queries are slower than 10s
+<2> Emit a DEBUG log when fetches are slower than 500ms
+<3> Emit an INFO log when indexing takes longer than 5s
+
+You can also define these thresholds in your `elasticsearch.yml` file.  Indices
+that do not have a threshold set will inherit whatever is configured in the
+static config.
+
+Once the thresholds are set, you can toggle the logging level like any other
+logger:
+
+[source,js]
+----
+PUT /_cluster/settings
+{
+    "transient" : {
+        "logger.index.search.slowlog" : "DEBUG", <1>
+        "logger.index.indexing.slowlog" : WARN <2>
+    }
+}
+---- 
+<1> Set the search slowlog to DEBUG level
+<2> Set the indexing slowlog to WARN level
+
+
diff --git a/book.asciidoc b/book.asciidoc
@@ -107,6 +107,8 @@ include::500_Cluster_Admin.asciidoc[]
 
 include::510_Deployment.asciidoc[]
 
+include::520_Post_Deployment.asciidoc[]
+
 [[TODO]]
 [appendix]
 = TODO