|
| 1 | + |
| 2 | +== Closing thoughts |
| 3 | + |
| 4 | +This section covered a lot of ground, and a lot of deeply technical issues. |
| 5 | +Aggregations bring a power and flexibility to Elasticsearch that is hard to |
| 6 | +overstate. The ability to nest buckets and metrics, to quickly approximate |
| 7 | +cardinality and percentiles, to find statistical anomalies in your data, all |
| 8 | +while operating on near-real-time data and in parallel to full-text search... |
| 9 | +these are game-changers to many organizations. |
| 10 | + |
| 11 | +It is a feature that, once you start using it, you'll find dozens |
| 12 | +of other candidate uses. Real-time reporting and analytics is central to many |
| 13 | + organizations (be it over business intelligence or server logs). |
| 14 | + |
| 15 | +But with great power comes great responsibility, and for Elasticsearch that often |
| 16 | +means proper memory stewardship. Memory is often the limiting factor in |
| 17 | +Elasticsearch deployments, particularly those that heavily utilize aggregations. |
| 18 | +Because aggregation data is loaded to fielddata -- and this is an in-memory data |
| 19 | +structure -- managing efficient memory usage is very important. |
| 20 | + |
| 21 | +The management of this memory can take several different forms depending on your |
| 22 | +particular use-case: |
| 23 | + |
| 24 | +- At a data level, by making sure you analyze (or `not_analyze`) your data appropriately |
| 25 | +so that it is memory-friendly |
| 26 | +- During indexing, by configuring heavy fields to use disk-based Doc Values instead |
| 27 | +of in-memory fielddata |
| 28 | +- At search time, by utilizing approximate aggregations and data filtering |
| 29 | +- At a node level, by setting hard memory limits and dynamic circuit breaker limits |
| 30 | +- At an operations level, by monitoring memory usage and controlling slow garbage |
| 31 | +collection cycles, potentially by adding more nodes to the cluster |
| 32 | + |
| 33 | +Most deployments will use one or more of the above methods. The exact combination |
| 34 | +is highly dependent on your particular environment. Some organizations need |
| 35 | +blisteringly fast responses and opt to simply add more nodes. Other organizations |
| 36 | +are limited by budget and choose Doc Values and approximate aggregations. |
| 37 | + |
| 38 | +Whatever the path you take, it is important to assess the available options and |
| 39 | +create both a short- and long-term plan. Decide how your memory situation exists |
| 40 | +today and what (if anything) needs to be done. Then decide what will happen in |
| 41 | +6-months, 1-year, etc as your data grows...what methods will you use to continue |
| 42 | +scaling? |
| 43 | + |
| 44 | +It is better to plan out these life-cycles of your cluster ahead of time, rather |
| 45 | +than panicking at 3am in the morning because your cluster is at 90% heap utilization. |
0 commit comments