Curated papers, articles, and blogs sharing how data science & machine learning is applied in production. ⚙️
Have a favourite piece you're not seeing here? Want to contribute? Make a pull request! 😄
Table of Contents
- Data Quality
- Data Engineering
- Classification
- Regression
- Recommendation
- Search/Ranking
- Natural Language Processing
- Sequence Modelling
- Computer Vision
- Reinforcement Learning
- Anomaly Detection
- Graph
- Optimization
- Information Extraction
- Validation and A/B Testing
- Practices
- Monitoring Data Quality at Scale with Statistical Modeling
Uber - An Approach to Data Quality for Netflix Personalization Systems
Netflix
- Zipline: Airbnb’s Machine Learning Data Management Platform
Airbnb - Sputnik: Airbnb’s Apache Spark Framework for Data Engineering
Airbnb - Feast: Bridging ML Models and Data
Gojek
- High-precision phrase-based document classification on a modern scale
LinkedIn - Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing
WalmartLabs - Large-scale Item Categorization for e-Commerce
DianPing,eBay - Categorizing Products at Scale
Shopify - Learning to Diagnose with LSTM Recurrent Neural Networks
Google - Discovering and Classifying In-app Message Intent at Airbnb
Airbnb - How we built the good first issues feature
GitHub
- Using Machine Learning to Predict Value of Homes On Airbnb
Airbnb - Modeling conversion rates and saving millions of dollars using Kaplan-Meier and gamma distributions
Better - Using machine learning to predict the value of ad requests
Twitter
- Amazon.com Recommendations: Item-toItem Collaborative Filtering
Amazon - Recommending Complementary Products in E-Commerce Push Notifications with a Mixture Model Approach
Alibaba - Behavior Sequence Transformer for E-commerce Recommendation in Alibaba
Alibaba - Session-based Recommendations with Recurrent Neural Networks
Telefonica - Deep Neural Networks for YouTube Recommendations
YouTube - Personalized Recommendations for Experiences Using Deep Learning
TripAdvisor - E-commerce in Your Inbox:
Product Recommendations at Scale
Yahoo - Powered by AI: Instagram’s Explore recommender system
Facebook - Artwork Personalization at Netflix
Netflix - To Be Continued: Helping you find shows to continue watching on Netflix
Netflix - Learning a Personalized Homepage
Netflix - https://eng.uber.com/uber-eats-graph-learning/
Uber - Calibrated Recommendations
Netflix - Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits
Spotify - For Your Ears Only: Personalizing Spotify Home with Machine Learning
Spotify - Reach for the Top: How Spotify Built Shortcuts in Just Six Months
Spotify - The Evolution of Kit: Automating Marketing Using Machine Learning
Shopify - Using machine learning to predict what file you need next (Part 1)
Dropbox - Using machine learning to predict what file you need next (Part 2)
Dropbox - A closer look at the AI behind course recommendations on LinkedIn Learning (Part 1)
LinkedIn - A closer look at the AI behind course recommendations on LinkedIn Learning (Part 2)
LinkedIn
- Amazon Search: The Joy of Ranking Products
Amazon - How Lazada Ranks Products to Improve Customer Experience and Conversion
Lazada - Using Deep Learning at Scale in Twitter’s Timelines
Twitter - Machine Learning-Powered Search Ranking of Airbnb Experiences
Airbnb - Applying Deep Learning To Airbnb Search
Airbnb - Ranking Relevance in Yahoo Search
Yahoo - An Ensemble-Based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy
Etsy - Why Do People Buy Seemingly Irrelevant Items in Voice Product Search?
Amazon - The AI Behind LinkedIn Recruiter search and recommendation systems
LinkedIn - AI at Scale in Bing
Microsoft - Query Understanding Engine in Traveloka Universal Search
Traveloka - Search-based User Interest Modeling with Lifelong Sequential Behavior Data for Click-Through Rate Prediction
Alibaba - The Secret Sauce Behind Search Personalisation
GoJek
- Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba
Alibaba - Embeddings@Twitter
Twitter - Listing Embeddings in Search Ranking (Paper)
Airbnb
- Abusive Language Detection in Online User Content
Yahoo - How natural language processing helps LinkedIn members get support easily
LinkedIn - Building Smart Replies for Member Messages
LinkedIn - Smart Reply: Automated Response Suggestion for Email
Google - Assistive AI Makes Replying Easier
Microsoft - AI advances to better detect hate speech
Facebook - Using Neural Networks to Find Answers in Tables
Google - A Scalable Approach to Reducing Gender Bias in Google Translate
Google - A state-of-the-art open source chatbot
Facebook - Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting
Amazon - How Gojek Uses NLP to Name Pickup Locations at Scale
GoJek
- Recommending Complementary Products in E-Commerce Push Notifications with a Mixture Model Approach
Alibaba - Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction
Alibaba - Learning to Diagnose with LSTM Recurrent Neural Networks
Google - Deep Learning for Understanding Consumer Histories
Zalando - Continual Prediction of Notification Attendance with Classical and Deep Network Approaches
Telefonica
- Categorizing Listing Photos at Airbnb
Airbnb - Amenity Detection and Beyond — New Frontiers of Computer Vision at Airbnb
Airbnb - Powered by AI: Advancing product understanding and building new shopping experiences
Facebook - Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning
Dropbox - How we improved computer vision metrics by more than 5% only by cleaning labelling errors
Deepomatic - A Neural Weather Model for Eight-Hour Precipitation Forecasting
Google - Converting text to images for product discovery
Amazon
- Deep Reinforcement Learning for Sponsored Search Real-time Bidding
Alibaba - Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning
Alibaba - Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising
Alibaba - Productionizing Deep Reinforcement Learning with Spark and MLflow
Zynga
- Detecting Performance Anomalies in External Firmware Deployments
Netflix - Detecting and preventing abuse on LinkedIn using isolation forests
LinkedIn - Uncovering Insurance Fraud Conspiracy with Network Learning
Ant Financial
- Unsupervised Extraction of Attributes and Their Values from Product Description
Rakuten - Information Extraction from Receipts with Graph Convolutional Networks
Nanonets
- The reusable holdout: Preserving validity in adaptive data analysis
Google - A/B Testing with Hierarchical Models in Python
Domino - Detecting interference: An A/B test of A/B tests
LinkedIn - Building inclusive products through A/B testing
LinkedIn - Experimenting to solve cramming
Twitter - Announcing a New Framework for Designing Optimal Experiments with Pyro
Uber - Enabling 10x More Experiments with Traveloka Experiment Platform
Traveloka
- Practical Recommendations for Gradient-Based Training of Deep Architectures
Yoshua Bengio - Machine Learning: The High Interest Credit Card of Technical Debt
Google - Rules of Machine Learning: Best Practices for ML Engineering
Google - Hidden Technical Debt in Machine Learning Systems
Google - On Challenges in Machine Learning Model Management
Amazon - 150 successful Machine Learning models: 6 lessons learned at Booking.com
Booking.com