Curated papers, articles, and blogs on data science & machine learning in production. ⚙️
Some people collect stamps. I collect these. 😅
Have a new ML project and need a starting point? These resources on ML applied in production share:
- How the problem is framed 🔎(e.g., personalization as recsys vs. search vs. sequences)
- What machine learning techniques worked ✅ (and sometimes, what didn't ❌)
- Why it works, the science behind it with research, literature, and references 📂
- What real-world results were achieved (so you can better assess ROI ⏰💰📈)
Table of Contents
- Data Quality
- Data Engineering
- Classification
- Regression
- Recommendation
- Search/Ranking
- Natural Language Processing
- Sequence Modelling
- Forecasting
- Computer Vision
- Reinforcement Learning
- Anomaly Detection
- Graph
- Optimization
- Information Extraction
- Weak Supervision
- Validation and A/B Testing
- Practices
- Failures
- Monitoring Data Quality at Scale with Statistical Modeling
Uber - An Approach to Data Quality for Netflix Personalization Systems
Netflix - Automating Large-Scale Data Quality Verification
Amazon - Meet Hodor — Gojek’s Upstream Data Quality Tool
Gojek - Reliable and Scalable Data Ingestion at Airbnb
Airbnb
- Zipline: Airbnb’s Machine Learning Data Management Platform
Airbnb - Sputnik: Airbnb’s Apache Spark Framework for Data Engineering
Airbnb - Feast: Bridging ML Models and Data
Gojek - Open Sourcing Amundsen: A Data Discovery And Metadata Platform
Lyft - Making Netflix’s Data Infrastructure Cost-Effective
Netflix - Shopify's Data Science & Engineering Foundations
Shopify - Metacat: Making Big Data Discoverable and Meaningful at Netflix
Netflix
- High-Precision Phrase-Based Document Classification on a Modern Scale
LinkedIn - Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing
WalmartLabs - Large-scale Item Categorization for e-Commerce
DianPing,eBay - Categorizing Products at Scale
Shopify - Learning to Diagnose with LSTM Recurrent Neural Networks
Google - Discovering and Classifying In-app Message Intent at Airbnb
Airbnb - How We Built the Good First Issues Feature
GitHub - Testing Firefox More Efficiently with Machine Learning
Mozilla - Prediction of Advertiser Churn for Google AdWords
Google
- Using Machine Learning to Predict Value of Homes On Airbnb
Airbnb - Modeling Conversion Rates and Saving Millions Using Kaplan-Meier and Gamma Distributions
Better - Using Machine Learning to Predict the Value of Ad Requests
Twitter
- Amazon.com Recommendations: Item-toItem Collaborative Filtering
Amazon - Recommending Complementary Products in E-Commerce Push Notifications with Mixture Models
Alibaba - Behavior Sequence Transformer for E-commerce Recommendation in Alibaba
Alibaba - Session-based Recommendations with Recurrent Neural Networks
Telefonica - How 20th Century Fox uses ML to predict a movie audience (paper)
20th Century Fox - Deep Neural Networks for YouTube Recommendations
YouTube - Personalized Recommendations for Experiences Using Deep Learning
TripAdvisor - E-commerce in Your Inbox: Product Recommendations at Scale
Yahoo - Product Recommendations at Scale
Yahoo - Powered by AI: Instagram’s Explore recommender system
Facebook - Artwork Personalization at Netflix
Netflix - To Be Continued: Helping you find shows to continue watching on Netflix
Netflix - Learning a Personalized Homepage
Netflix - Calibrated Recommendations
Netflix - Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Uber - How Music Recommendation Works — And Doesn’t Work
Spotify - Music recommendation at Spotify
Spotify - Recommending Music on Spotify with Deep Learning
Spotify - For Your Ears Only: Personalizing Spotify Home with Machine Learning
Spotify - Reach for the Top: How Spotify Built Shortcuts in Just Six Months
Spotify - The Evolution of Kit: Automating Marketing Using Machine Learning
Shopify - Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits
Spotify - Using Machine Learning to Predict what File you Need Next (Part 1)
Dropbox - Using Machine Learning to Predict what File you Need Next (Part 2)
Dropbox - A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 1)
LinkedIn - A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 2)
LinkedIn - A recommender system in 30 lines of Clojure
Findka - How TikTok recommends videos #ForYou
ByteDance
- Amazon Search: The Joy of Ranking Products
Amazon - How Lazada Ranks Products to Improve Customer Experience and Conversion
Lazada - Using Deep Learning at Scale in Twitter’s Timelines
Twitter - Machine Learning-Powered Search Ranking of Airbnb Experiences
Airbnb - Applying Deep Learning To Airbnb Search
Airbnb - Managing Diversity in Airbnb Search
Airbnb - Ranking Relevance in Yahoo Search
Yahoo - An Ensemble-Based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy
Etsy - Why Do People Buy Seemingly Irrelevant Items in Voice Product Search?
Amazon - The AI Behind LinkedIn Recruiter search and recommendation systems
LinkedIn - AI at Scale in Bing
Microsoft - Query Understanding Engine in Traveloka Universal Search
Traveloka - Search-based User Interest Modeling with Lifelong Sequential Behavior Data for CTR Prediction
Alibaba - The Secret Sauce Behind Search Personalisation
GoJek
- Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba
Alibaba - Embeddings@Twitter
Twitter - Listing Embeddings in Search Ranking (Paper)
Airbnb - Understanding Latent Style
Stitch Fix
- Abusive Language Detection in Online User Content
Yahoo - How natural language processing helps LinkedIn members get support easily
LinkedIn - Building Smart Replies for Member Messages
LinkedIn - Smart Reply: Automated Response Suggestion for Email
Google - SmartReply for YouTube Creators
Google - Using Neural Networks to Find Answers in Tables
Google - A Scalable Approach to Reducing Gender Bias in Google Translate
Google - Assistive AI Makes Replying Easier
Microsoft - AI Advances to Better Detect Hate Speech
Facebook - A State-of-the-Art Open Source Chatbot
Facebook - A Highly Efficient, Real-Time Text-to-Speech System Deployed on CPUs
Facebook - Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting
Amazon - How Gojek Uses NLP to Name Pickup Locations at Scale
GoJek - Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want
Stitch Fix - The State-of-the-art Open-Domain Chatbot in Chinese and English
Baidu
- Recommending Complementary Products in E-Commerce Push Notifications with Mixture Models
Alibaba - Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction
Alibaba - Learning to Diagnose with LSTM Recurrent Neural Networks
Google - Deep Learning for Understanding Consumer Histories
Zalando - Continual Prediction of Notification Attendance with Classical and Deep Network Approaches
Telefonica
- Forecasting at Uber: An Introduction
Uber - Engineering Extreme Event Forecasting at Uber with RNN
Uber - Under the Hood of Gojek’s Automated Forecasting Tool
GoJek
- Categorizing Listing Photos at Airbnb
Airbnb - Amenity Detection and Beyond — New Frontiers of Computer Vision at Airbnb
Airbnb - Powered by AI: Advancing product understanding and building new shopping experiences
Facebook - Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning
Dropbox - How we Improved Computer Vision Metrics by More Than 5% Only by Cleaning Labelling Errors
Deepomatic - A Neural Weather Model for Eight-Hour Precipitation Forecasting
Google - Converting text to images for product discovery
Amazon - How Disney uses PyTorch for animated character recognition
Disney - Machine Learning-based Damage Assessment for Disaster Relief
Google - RepNet: Counting Repetitions in Videos
Google
- Deep Reinforcement Learning for Sponsored Search Real-time Bidding
Alibaba - Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning
Alibaba - Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising
Alibaba - Productionizing Deep Reinforcement Learning with Spark and MLflow
Zynga - Building AI Trading Systems
Denny Britz
- Detecting Performance Anomalies in External Firmware Deployments
Netflix - Detecting and preventing abuse on LinkedIn using isolation forests
LinkedIn - Uncovering Insurance Fraud Conspiracy with Network Learning
Ant Financial - How does spam protection work on Stack Exchange?
Stack Exchange
- Retail Graph — Walmart’s Product Knowledge Graph
Walmart - Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Uber - AliGraph: A Comprehensive Graph Neural Network Platform
Alibaba
- How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats
Uber - Next-Generation Optimization for Dasher Dispatch at DoorDash
DoorDash - Matchmaking in Lyft Line (Part 1) (Part 2) (Part 3)
Lyft - The Data and Science behind GrabShare Carpooling
Grab
- Unsupervised Extraction of Attributes and Their Values from Product Description
Rakuten - Information Extraction from Receipts with Graph Convolutional Networks
Nanonets - Using Machine Learning to Index Text from Billions of Images
Dropbox - Extracting Structured Data from Templatic Documents
Google
- Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale
Google - Osprey: Weak Supervision of Imbalanced Extraction Problems without Code
Intel - Overton: A Data System for Monitoring and Improving Machine-Learned Products
Apple - Bootstrapping Conversational Agents with Weak Supervision
IBM
- The Reusable Holdout: Preserving Validity in Adaptive Data Analysis
Google - A/B Testing with Hierarchical Models in Python
Domino - Detecting Interference: An A/B Test of A/B Tests
LinkedIn - Building Inclusive Products Through A/B Testing
LinkedIn - Experimenting to solve cramming
Twitter - Announcing a New Framework for Designing Optimal Experiments with Pyro
Uber - Enabling 10x More Experiments with Traveloka Experiment Platform
Traveloka - Large scale experimentation at StitchFix (Paper)
Stitch Fix
- Practical Recommendations for Gradient-Based Training of Deep Architectures
Yoshua Bengio - Machine Learning: The High Interest Credit Card of Technical Debt
Google - Rules of Machine Learning: Best Practices for ML Engineering
Google - Hidden Technical Debt in Machine Learning Systems
Google - On Challenges in Machine Learning Model Management
Amazon - Machine Learning in production: the Booking.com approach
Booking - 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com
Booking - Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department
Stitch Fix
- 160k+ High School Students Will Graduate Only If a Model Allows Them to
International Baccalaureate - When It Comes to Gorillas, Google Photos Remains Blind
Google