You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Welcome to Discussion! This post is a running list of frequently asked questions.
If you have questions, comments, suggestions, and practical problems (when applying this script to your datasets) that are unaddressed in this list, feel free to open a discussion or comment on my Medium article.
For bugs/errors in code, please open an issue. An issue is expected to be addressed in the following weekend.
Sharing your code/contributing to this repo are very welcome, please open a discussion with tag "ideas"/"Show and Tell", or create a pull request.
Question Guidelines
Please specify your model formula and variables.
There are two types of MMM.
Multiplicative (this repo): log y = b0 + b1 * log x1 + ... + bn * log xn
Additive: y = b0 + b1 * x1 + ... + bn * xn
If you are not asking about the multiplicative MMM in this repo, please do specify your model and variables.
Give an example if you want to better illustrate your question.
Code snippets will work, but chunks of code are unnecessary for Q&A as I'm unable to debug them.
FAQ
Q1: Model Selection: If I have to benchmark different models, which metrics do you suggest to use?
Some criteria for model selection:
Adstock parameters: are they reasonable, in line with your domain knowledge.
MAPE: generally I think below 15% or 20% is acceptable, sometimes 30%, it depends on your case.
Rhat (Rhat=1 at convergence), n_eff (effective sample size) of parameters: shows if the parameter has a good convergence
domain knowledge, existing theories/findings
If you still cannot select a model, just go ahead test all your findings. Whatever found by MMM are only mathematical solutions, the real world does not necessarily act like this, they have to be validated by A/B testing.
Q2: How to do budget optimization?
The idea is:
Set your annual budget for each channel
Set constraint for each channel, how much change you allow, e.g., sem: [-20%, +20%], tv: [-10%, +10%], display: [-10%, +10%]
Move budget from low mROAS channels to high mROAS channels in a greedy way. for example:
sem -20%
move to: tv +10%
-> if tv doesn't use up the money, give the rest money to: display + 10%. so on so forth until no budget left to be optimized.
Here you get the optimized budget plan.
Breakdown the original and optimized budget plans to weekly spending in proportion, plug the weekly spending into MMM, you will get the contribution (how much sales they will end up with) of the two plans, and see how much lift the optimized plan gives.
(This hasn't been implemented and I currently have no time to. If you're willing to share your wisdom to help others - aka contribute to this repo, I'm happy to add you as a collaborator!)
Q3: "RuntimeError: Initialization failed" when executing sm.sampling()
Mysterious error with a vague error message, may be associated with input data quality. It would be very helpful if you could share your solution to this error.
Solution from reader:
"I realized the issue is with Y variable. I made sure there were no missing data and all positive in Y. But there are good amount of Y which are 0. upon imputing them with mean, I could build the model."
Q4: Normalization: Why do you center the variables? Will MinMax transformation be a good approach?
I use mean centralization because 1) I want the model to focus on the trend - how the changes in X influence y, not the absolute number; 2) avoid negative values for log1p.
I feel MinMax is less related to the trend because it's not proportional to the original data. But you're free to try MinMax, maybe I'm wrong.
Normalization is optional for regression analysis. You can also build the model without normalization.
Q5: Multiplicative MMM vs Additive MMM, differences between this repo and the Google's paper.
The Google's paper built an additive MMM - both media effects and control effects are additive.
In this project, I built a multiplicative MMM - media effects are multiplicative, control effects are additive. So I have two separate models for control effects and media effects (the first and second model).
Both have pros and cons, just try to figure out which is a better fit for you.
The Google's paper use media spend to predict sales, I use media impression. Because impression is more directly related to sales, and usually impression is trackable. The third model - diminishing return model - brings in media spending, to calculate ROAS and mROAS.
Q6: Why isn’t this a pure regression problem? (Why this cannot be solved by sklearn/OLS regression?)
Because there are constraints on media coefficients. Media coefficients indicate how much a channel contributes to sales and how elastic it is, they should be positive. Having a negative coefficient means running ads in this channel will hurt your sales constantly, the more you spend on ads, the more sales you lose. That’s counterintuitive.
But if you run linear regression, it’s inevitable you will get negative or zero coefficients. Linear regression optimizes towards a minimum error, it doesn’t care about the coefficients.
MMM is a Bayesian regression problem, where we have prior knowledge about the parameters. I specify the prior distribution of parameters, constrain them to be positive/negative/within a certain range. What STAN or any MCMC sampler does is, draw samples of parameters from this sample space for a large number of times (e.g., 1000 iterations), and propose 1000 possible parameter sets. I use the mean of each parameter as the estimated parameter value.
Q7: Meaning of media coefficients.
Multiplicative MMM: log y = b0 + b1 * log x1 + b2 * log x2 + ... + bn * log xn
Media coefficients indicate the elasticity of the channel - % increase in y (sales) when a media variable increases 1%. If a channel's coefficient is 0.02, 1% increase in the channel impression (or spend) will lead to 0.02% increase in sales. Change in natural log ≈ percentage change.
The multiplicative model structure captures the diminishing effect (as impression increases, the incremental sales it brings decreases), see Cobb–Douglas function and the visualization for detail.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Welcome to Discussion! This post is a running list of frequently asked questions.
If you have questions, comments, suggestions, and practical problems (when applying this script to your datasets) that are unaddressed in this list, feel free to open a discussion or comment on my Medium article.
For bugs/errors in code, please open an issue. An issue is expected to be addressed in the following weekend.
Sharing your code/contributing to this repo are very welcome, please open a discussion with tag "ideas"/"Show and Tell", or create a pull request.
Question Guidelines
Please specify your model formula and variables.
There are two types of MMM.
Multiplicative (this repo): log y = b0 + b1 * log x1 + ... + bn * log xn
Additive: y = b0 + b1 * x1 + ... + bn * xn
If you are not asking about the multiplicative MMM in this repo, please do specify your model and variables.
Give an example if you want to better illustrate your question.
Code snippets will work, but chunks of code are unnecessary for Q&A as I'm unable to debug them.
FAQ
Q1: Model Selection: If I have to benchmark different models, which metrics do you suggest to use?
Some criteria for model selection:
If you still cannot select a model, just go ahead test all your findings. Whatever found by MMM are only mathematical solutions, the real world does not necessarily act like this, they have to be validated by A/B testing.
Q2: How to do budget optimization?
The idea is:
sem -20%
move to: tv +10%
-> if tv doesn't use up the money, give the rest money to: display + 10%. so on so forth until no budget left to be optimized.
Here you get the optimized budget plan.
(This hasn't been implemented and I currently have no time to. If you're willing to share your wisdom to help others - aka contribute to this repo, I'm happy to add you as a collaborator!)
Q3: "RuntimeError: Initialization failed" when executing sm.sampling()
Mysterious error with a vague error message, may be associated with input data quality. It would be very helpful if you could share your solution to this error.
Solution from reader:
Q4: Normalization: Why do you center the variables? Will MinMax transformation be a good approach?
I use mean centralization because 1) I want the model to focus on the trend - how the changes in X influence y, not the absolute number; 2) avoid negative values for log1p.
I feel MinMax is less related to the trend because it's not proportional to the original data. But you're free to try MinMax, maybe I'm wrong.
Normalization is optional for regression analysis. You can also build the model without normalization.
Q5: Multiplicative MMM vs Additive MMM, differences between this repo and the Google's paper.
In this project, I built a multiplicative MMM - media effects are multiplicative, control effects are additive. So I have two separate models for control effects and media effects (the first and second model).
Both have pros and cons, just try to figure out which is a better fit for you.
Q6: Why isn’t this a pure regression problem? (Why this cannot be solved by sklearn/OLS regression?)
Because there are constraints on media coefficients. Media coefficients indicate how much a channel contributes to sales and how elastic it is, they should be positive. Having a negative coefficient means running ads in this channel will hurt your sales constantly, the more you spend on ads, the more sales you lose. That’s counterintuitive.
But if you run linear regression, it’s inevitable you will get negative or zero coefficients. Linear regression optimizes towards a minimum error, it doesn’t care about the coefficients.
MMM is a Bayesian regression problem, where we have prior knowledge about the parameters. I specify the prior distribution of parameters, constrain them to be positive/negative/within a certain range. What STAN or any MCMC sampler does is, draw samples of parameters from this sample space for a large number of times (e.g., 1000 iterations), and propose 1000 possible parameter sets. I use the mean of each parameter as the estimated parameter value.
Q7: Meaning of media coefficients.
Multiplicative MMM: log y = b0 + b1 * log x1 + b2 * log x2 + ... + bn * log xn
Media coefficients indicate the elasticity of the channel - % increase in y (sales) when a media variable increases 1%. If a channel's coefficient is 0.02, 1% increase in the channel impression (or spend) will lead to 0.02% increase in sales. Change in natural log ≈ percentage change.
The multiplicative model structure captures the diminishing effect (as impression increases, the incremental sales it brings decreases), see Cobb–Douglas function and the visualization for detail.
Beta Was this translation helpful? Give feedback.
All reactions