|
| 1 | +--- |
| 2 | +layout: case-study |
| 3 | +hide_title: true # so we have control in case-study layout, but can still use page |
| 4 | +title: Real Time Session Aggregation at Optimizely |
| 5 | +study_domain: optimizely.com |
| 6 | +menu_title: Optimizely |
| 7 | +excerpt_separator: <!--more--> |
| 8 | +--- |
| 9 | +<!-- |
| 10 | + Licensed to the Apache Software Foundation (ASF) under one or more |
| 11 | + contributor license agreements. See the NOTICE file distributed with |
| 12 | + this work for additional information regarding copyright ownership. |
| 13 | + The ASF licenses this file to You under the Apache License, Version 2.0 |
| 14 | + (the "License"); you may not use this file except in compliance with |
| 15 | + the License. You may obtain a copy of the License at |
| 16 | +
|
| 17 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 18 | +
|
| 19 | + Unless required by applicable law or agreed to in writing, software |
| 20 | + distributed under the License is distributed on an "AS IS" BASIS, |
| 21 | + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 22 | + See the License for the specific language governing permissions and |
| 23 | + limitations under the License. |
| 24 | +--> |
| 25 | + |
| 26 | +Testing the excerpt |
| 27 | + |
| 28 | +<!--more--> |
| 29 | + |
| 30 | +Optimizely is a world’s leading experimentation platform, enabling businesses to deliver continuous experimentation and personalization across websites, mobile apps and connected devices. At Optimizely, billions of events are tracked on a daily basis. Session metrics are among the key metrics provided to their end user in real time. Prior to introducing Samza for realtime computation, the engineering team at Optimizely used HBase to store and serve experimentation data, and Druid for personalization data including session metrics. As business requirements evolved, the Druid-based solution became more and more challenging. |
| 31 | + |
| 32 | +- Long delays in session metrics caused by M/R jobs |
| 33 | +- Reprocessing of events due to inability to incrementally update Druid index |
| 34 | +- Difficulties in scaling dimensions and cardinality |
| 35 | +- Queries expanding long time periods are expensive |
| 36 | + |
| 37 | +The engineering team at Optimizely decided to move away from Druid and focus on HBase as the store, and introduced stream processing to pre-aggregate and deduplicate session events. They evaluated multiple stream processing platforms and chose Samza as their stream processing platform. In their solution, every session event is tagged with an identifier for up to 30 minutes; upon receiving a session event, the Samza job updates session metadata and aggregates counters for the session that is stored in a local RocksDB state store. At the end of each one-minute window, aggregated session metrics are ingested to HBase. With the new solution |
| 38 | + |
| 39 | +- The median query latency was reduced from 40+ ms to 5 ms |
| 40 | +- Session metrics are now available in realtime |
| 41 | +- HBase query response time is improved due to reduced write-rate |
| 42 | +- HBase storage requirement are drastically reduced |
| 43 | +- Lower development effort thanks to out-of-the-box Kafka integration |
| 44 | + |
| 45 | +Here is a testimonial from Optimizely |
| 46 | + |
| 47 | +“At Optimizely, we have built the world’s leading experimentation platform, which ingests billions of click-stream events a day from millions of visitors for analysis. Apache Samza has been a great asset to Optimizely's Event ingestion pipeline allowing us to perform large scale, real time stream computing such as aggregations (e.g. session computations) and data enrichment on a multiple billion events / day scale. The programming model, durability and the close integration with Apache Kafka fit our needs perfectly” said Vignesh Sukumar, Senior Engineering Manager at Optimizely” |
| 48 | + |
| 49 | +In addition, stream processing is also applied to other use cases such as data enrichment, event stream partitioning and metrics processing at Optimizely. |
| 50 | + |
| 51 | +Key Samza features: *Stateful processing*, *Windowing*, *Kafka-integration*, *Scalability*, *Fault-tolerant* |
| 52 | + |
| 53 | +More information |
| 54 | + |
| 55 | +- [https://medium.com/engineers-optimizely/from-batching-to-streaming-real-time-session-metrics-using-samza-part-1-aed2051dd7a3](https://medium.com/engineers-optimizely/from-batching-to-streaming-real-time-session-metrics-using-samza-part-1-aed2051dd7a3) |
| 56 | + |
| 57 | +- [https://medium.com/engineers-optimizely/from-batching-to-streaming-real-time-session-metrics-using-samza-part-2-b596350a7820](https://medium.com/engineers-optimizely/from-batching-to-streaming-real-time-session-metrics-using-samza-part-2-b596350a7820) |
| 58 | + |
0 commit comments