Skip to content

Commit dfab6c1

Browse files
committed
closing thoughts to aggs section
1 parent 6931053 commit dfab6c1

File tree

2 files changed

+46
-0
lines changed

2 files changed

+46
-0
lines changed
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
2+
== Closing thoughts
3+
4+
This section covered a lot of ground, and a lot of deeply technical issues.
5+
Aggregations bring a power and flexibility to Elasticsearch that is hard to
6+
overstate. The ability to nest buckets and metrics, to quickly approximate
7+
cardinality and percentiles, to find statistical anomalies in your data, all
8+
while operating on near-real-time data and in parallel to full-text search...
9+
these are game-changers to many organizations.
10+
11+
It is a feature that, once you start using it, you'll find dozens
12+
of other candidate uses. Real-time reporting and analytics is central to many
13+
organizations (be it over business intelligence or server logs).
14+
15+
But with great power comes great responsibility, and for Elasticsearch that often
16+
means proper memory stewardship. Memory is often the limiting factor in
17+
Elasticsearch deployments, particularly those that heavily utilize aggregations.
18+
Because aggregation data is loaded to fielddata -- and this is an in-memory data
19+
structure -- managing efficient memory usage is very important.
20+
21+
The management of this memory can take several different forms depending on your
22+
particular use-case:
23+
24+
- At a data level, by making sure you analyze (or `not_analyze`) your data appropriately
25+
so that it is memory-friendly
26+
- During indexing, by configuring heavy fields to use disk-based Doc Values instead
27+
of in-memory fielddata
28+
- At search time, by utilizing approximate aggregations and data filtering
29+
- At a node level, by setting hard memory limits and dynamic circuit breaker limits
30+
- At an operations level, by monitoring memory usage and controlling slow garbage
31+
collection cycles, potentially by adding more nodes to the cluster
32+
33+
Most deployments will use one or more of the above methods. The exact combination
34+
is highly dependent on your particular environment. Some organizations need
35+
blisteringly fast responses and opt to simply add more nodes. Other organizations
36+
are limited by budget and choose Doc Values and approximate aggregations.
37+
38+
Whatever the path you take, it is important to assess the available options and
39+
create both a short- and long-term plan. Decide how your memory situation exists
40+
today and what (if anything) needs to be done. Then decide what will happen in
41+
6-months, 1-year, etc as your data grows...what methods will you use to continue
42+
scaling?
43+
44+
It is better to plan out these life-cycles of your cluster ahead of time, rather
45+
than panicking at 3am in the morning because your cluster is at 90% heap utilization.

306_Practical_Considerations.asciidoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,4 @@ include::300_Aggregations/115_eager.asciidoc[]
1414

1515
include::300_Aggregations/120_breadth_vs_depth.asciidoc[]
1616

17+
include::300_Aggregations/125_Conclusion.asciidoc[]

0 commit comments

Comments
 (0)