financial crime teams face a trade-off: rules that catch more fraud also create more false positives. too many alerts waste investigation time and cost money, while too strict thresholds let fraud slip through.
- 500 customers (kyc risk, pep flag, sar history)
- 10,000v transactions (amount, country, mcc, channel, device, ip, timestamp)
- ~3% transactions labelled as fraud (fraud_y) for evaluation
- amount_z : z-score of transaction amount relative to customer history
- tx_count_1d, tx_count_7d : customer activity velocity
- geo_mismatch : tx country vs home country
- device_fanout : distinct customers per device in last 7 days
- r1 : amount > 95th percentile per customer
- r2 : tx_count_1d >= 10
- r3 : geo_mismatch == 1 and amount > 80th percentile globally
final alert logic = r1 OR (r2 AND r3)
- baseline (p=0.95): 304 alerts, precision ~1.6%, recall ~24%, weekly cost ~£12.5k
- tuned (p≈0.70): 329 alerts, precision ~2.1%, recall ~33%, weekly cost ~£11.9k
- drift monitoring showed stable r1, fluctuating r3, higher alert rates in online/mobile
this project uses synthetic data. it is an educational prototype, not a production aml engine or legal advice.
false_positive_lab/ README.md # project overview and instructions requirements.txt # dependencies .gitignore # files to ignore in gits LICENSE # open source license (MIT) data/ # synthetic customer and transaction datasets docs/ # charts (cost curve, pr curve, drift monitoring) notebooks/ # main analysis notebook outputs/ # csv of threshold evaluation
tests/ # simple pytest scripts
basic validation checks are included in /tests/test_features.py.
- transactions_fp.csv exists in
/data - amount_z exists and has no nulls
- geo_mismatch is binary (0/1)
- device_fanout is always >= 1
pytest -v

