You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hide_title: true # so we have control in case-study layout, but can still use page
4
-
title: Low Latency WebScale Fraud Prevention
4
+
title: Low Latency Web-Scale Fraud Prevention
5
5
study_domain: ebay.com
6
6
menu_title: eBay
7
7
excerpt_separator: <!--more-->
@@ -27,30 +27,43 @@ How Samza powers low-latency, web-scale fraud prevention at Ebay?
27
27
28
28
<!--more-->
29
29
30
-
eBay Enterprise is the world’s largest omni-channel commerce provider with
31
-
hundreds millions of units shipped annually, as commerce gets more
32
-
convenient and complex, so does fraud. The engineering team at eBay
33
-
Enterprise selected Samza as the platform to build the horizontally
34
-
scalable, realtime (sub-seconds) and fault tolerant abnormality detection
35
-
system. For example, the system computes and evaluates key metrics to
36
-
detect abnormal behaviors
30
+
eBay Enterprise is the world’s largest omni-channel commerce provider. The engineering team at eBay chose Apache Samza to build _PreCog_, their
31
+
horizontally scalable anomaly detection system.
37
32
38
-
- Transaction velocity (#tnx/day) and change (#tnx/day vs #tnx/day over n days)
39
-
- Amount velocity ($tnx/day) and change ($tnx/day vs $tnx/day over n days)
33
+
_PreCog_ extensively leverages Samza's high-performance, fault-tolerant local storage. Its architecture had the following requirements, for which Samza perfectly fit the bill: <br/>
40
34
41
-
A wide range of realtime and historical adjunct data from various sources
42
-
including people, places, interests, social and connections are ingested
43
-
through Kafka, and stored in local RocksDB state store with changelog
44
-
enabled for recovery. Incoming transaction data is aggregated using
45
-
windowing and then joined with adjunct data stores in multiple stages.
46
-
The system generates potential fraud cases for review real time. Finally,
47
-
the engineering team at eBay Enterprise has built an OpenTSDB and Grafana
48
-
based monitoring system using metrics collected through JMX.
35
+
_Web-scale:_ Scale to a large number of users and large volume of data per-user. Additionally, should be possible to add more commodity hardware and scale horizontally. <br/>
36
+
_Low-latency:_ Process customer interactions real-time by reacting in milliseconds instead of hours. <br/>
37
+
_Fault-tolerance:_ Gracefully tolerate and handle hardware failures. <br/>
The PreCog anomaly-detection system comprises of multiple tiers, with each tier consisting of multiple Samza jobs, which process the output of the previous tier.
42
+
43
+
_Ingestion tier:_ In this tier, a variety of historical and realtime data from various
44
+
sources including people, places etc., is ingested into Kafka.
45
+
46
+
_Fanout tier:_ This tier consists of Samza jobs which process the Kafka events, fan them out and re-partition them based on various
47
+
facets like email-address, ip-address, credit-card number, shipping address etc.
48
+
49
+
_Compute tier:_ The Samza jobs in this tier consume messages from the fan-out tier and compute various key metrics and derived features. Features used to evaluate fraud include:
50
+
51
+
1. Number of transactions per-customer per-day <br/>
52
+
2. Change in the number of daily transactions over the past few days <br/>
53
+
3. Amount value ($$) of each transaction per-day <br/>
54
+
4. Change in the amount value of transactions over a sliding time-window <br/>
55
+
5. Number of transactions per shipping-address
56
+
57
+
_Assembly tier:_ This tier comprises of Samza jobs which join the output of the compute-tier with other additional data-sources
58
+
and make a final determination on transaction-fraud.
59
+
60
+
For monitoring the _PreCog_ pipeline, EBay leverages Samza's [JMXMetricsReporter](/learn/documentation/{{site.version}}/operations/monitoring.html) and ingests the reported metrics into OpenTSDB/ HBase. The metrics are then
0 commit comments