Skip to content

Commit 0b0a0ca

Browse files
committed
Clean-up the case-studies page for Ebay, add a diagram
Author: Jagadish <jvenkatraman@linkedin.com> Reviewers: Jagadish<jagadish@apache.org> Closes apache#724 from vjagadish1989/website-reorg17
1 parent e5ea9be commit 0b0a0ca

File tree

2 files changed

+35
-22
lines changed
  • docs

2 files changed

+35
-22
lines changed

docs/_case-studies/ebay.md

Lines changed: 35 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
layout: case-study
33
hide_title: true # so we have control in case-study layout, but can still use page
4-
title: Low Latency Web Scale Fraud Prevention
4+
title: Low Latency Web-Scale Fraud Prevention
55
study_domain: ebay.com
66
menu_title: eBay
77
excerpt_separator: <!--more-->
@@ -27,30 +27,43 @@ How Samza powers low-latency, web-scale fraud prevention at Ebay?
2727

2828
<!--more-->
2929

30-
eBay Enterprise is the world’s largest omni-channel commerce provider with
31-
hundreds millions of units shipped annually, as commerce gets more
32-
convenient and complex, so does fraud. The engineering team at eBay
33-
Enterprise selected Samza as the platform to build the horizontally
34-
scalable, realtime (sub-seconds) and fault tolerant abnormality detection
35-
system. For example, the system computes and evaluates key metrics to
36-
detect abnormal behaviors
30+
eBay Enterprise is the world’s largest omni-channel commerce provider. The engineering team at eBay chose Apache Samza to build _PreCog_, their
31+
horizontally scalable anomaly detection system.
3732

38-
- Transaction velocity (#tnx/day) and change (#tnx/day vs #tnx/day over n days)
39-
- Amount velocity ($tnx/day) and change ($tnx/day vs $tnx/day over n days)
33+
_PreCog_ extensively leverages Samza's high-performance, fault-tolerant local storage. Its architecture had the following requirements, for which Samza perfectly fit the bill: <br/>
4034

41-
A wide range of realtime and historical adjunct data from various sources
42-
including people, places, interests, social and connections are ingested
43-
through Kafka, and stored in local RocksDB state store with changelog
44-
enabled for recovery. Incoming transaction data is aggregated using
45-
windowing and then joined with adjunct data stores in multiple stages.
46-
The system generates potential fraud cases for review real time. Finally,
47-
the engineering team at eBay Enterprise has built an OpenTSDB and Grafana
48-
based monitoring system using metrics collected through JMX.
35+
_Web-scale:_ Scale to a large number of users and large volume of data per-user. Additionally, should be possible to add more commodity hardware and scale horizontally. <br/>
36+
_Low-latency:_ Process customer interactions real-time by reacting in milliseconds instead of hours. <br/>
37+
_Fault-tolerance:_ Gracefully tolerate and handle hardware failures. <br/>
4938

50-
Key Samza features: *Stateful processing*, *Windowing*, *Kafka-integration*,
51-
*JMX-metrics*
39+
![diagram-large](/img/{{site.version}}/learn/documentation/case-study/ebay.png)
5240

53-
More information
41+
The PreCog anomaly-detection system comprises of multiple tiers, with each tier consisting of multiple Samza jobs, which process the output of the previous tier.
42+
43+
_Ingestion tier:_ In this tier, a variety of historical and realtime data from various
44+
sources including people, places etc., is ingested into Kafka.
45+
46+
_Fanout tier:_ This tier consists of Samza jobs which process the Kafka events, fan them out and re-partition them based on various
47+
facets like email-address, ip-address, credit-card number, shipping address etc.
48+
49+
_Compute tier:_ The Samza jobs in this tier consume messages from the fan-out tier and compute various key metrics and derived features. Features used to evaluate fraud include:
50+
51+
1. Number of transactions per-customer per-day <br/>
52+
2. Change in the number of daily transactions over the past few days <br/>
53+
3. Amount value ($$) of each transaction per-day <br/>
54+
4. Change in the amount value of transactions over a sliding time-window <br/>
55+
5. Number of transactions per shipping-address
56+
57+
_Assembly tier:_ This tier comprises of Samza jobs which join the output of the compute-tier with other additional data-sources
58+
and make a final determination on transaction-fraud.
59+
60+
For monitoring the _PreCog_ pipeline, EBay leverages Samza's [JMXMetricsReporter](/learn/documentation/{{site.version}}/operations/monitoring.html) and ingests the reported metrics into OpenTSDB/ HBase. The metrics are then
61+
visualzed using [Grafana](https://grafana.com/).
62+
63+
64+
Key Samza features: *Stateful processing*, *Windowing*, *Kafka-integration*, *JMX-metrics*
65+
66+
More information:
5467

5568
- [https://www.slideshare.net/edibice/extremely-low-latency-web-scale-fraud-prevention-with-apache-samza-kafka-and-friends](https://www.slideshare.net/edibice/extremely-low-latency-web-scale-fraud-prevention-with-apache-samza-kafka-and-friends)
56-
- [http://ebayenterprise.com/](http://ebayenterprise.com/)
69+
- [http://ebayenterprise.com/](http://ebayenterprise.com/)
26.4 KB
Loading

0 commit comments

Comments
 (0)