-
-
Notifications
You must be signed in to change notification settings - Fork 987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for more masking tutorials #3187
Comments
@fritzo I am particularly puzzled about this Why when i) using the context manager poutine.mask it shows under the "mask" in the trace the used mask tensor and when ii) using .mask() or obs_mask the "mask" of the trace shows "None". I am getting different results when using something like this: mask_t = torch.tensor([True,True])
logits = torch.tensor([3.,4.]) Case A: The trace shows the tensor mask_t under "mask" and 'fn': Independent(Categorical(logits: torch.Size([2])), 1) with pyro.poutine.mask(mask=mask_t):
pyro.sample("c",dist.Categorical(logits=logits).to_event(1)) Case B: The trace shows None under "mask" and then 'fn': Independent(MaskedDistribution(), 1) pyro.sample("c",dist.Categorical(logits=logits).mask(mask_t).to_event(1)) Is this expected behaviour? Shouldn't there be 2 different mask types? Thanks :) |
IIRC using |
@fritzo Thanks, that is very nice to know. I was aware of the need of This also leads me to the next concern (which lead me to try and find the mask above). Because I get different results when using mask_t = torch.Tensor([True,True])
logits = torch.Tensor([3.,4.])
targets = torch.tensor([0.,1.]) The mask is all with pyro.poutine.mask(mask=mask_t): #the mask is all True
pyro.sample("c",dist.Categorical(logits=logits).to_event(1),obs=targets) No poutine mask (therefore all values should be used in the computation?) --> Bad results pyro.sample("c",dist.Categorical(logits=logits).to_event(1),obs=targets) I am guessing this has to do with the |
Just curious, why Regarding mask, the rule of thumb is mask only applies to batch dimensions. Assume you have some univariate distributions and a mask with shape
|
@fehiepsi "Just curious, why Well that is actually a relief to hear, because the .to_event(1) is doing something (in combination with the poutine.mask), but not sure what. And I did not expect that to happen (I am not familiar enough with which distributions have batch or event shape though). I have Yes, I figured that the order And I now understand that |
looking at your code,
,
will raise an error, see this line. If it works for your code, then please raise a separate issue with small reproducible code. |
@fehiepsi annotated, looking into making a reproducible code, brb |
@fehiepsi Nevermind, it did not raise an error because I had It is still weird that it gives good results when with pyro.poutine.mask(mask=mask_t):
pyro.sample("c",dist.Categorical(logits=logits).to_event(1)) However when using pyro.sample("c",dist.Categorical(logits=logits).to_event(1)) or with pyro.poutine.mask(mask=mask_t):
pyro.sample("c",dist.Categorical(logits=logits)) The results are random. I will try to code everything back again with |
I think you can't use to_event here:
would raise an error. Maybe you can check the shapes of your |
@fehiepsi Oh, ok, interesting, well my logits simply have shape |
@fehiepsi Should I open another issue with these examples? The fail when import torch
from torch import tensor
from pyro import sample,plate
import pyro.distributions as dist
import pyro.poutine as poutine
from pyro.infer import SVI,Trace_ELBO
from pyro.optim import ClippedAdam
import pyro
def model1(x,obs_mask,x_class,class_mask):
"""
:param x: Data [N,L,feat_dim]
:param obs_mask: Data sites to mask [N,L]
:param x_class: Target values [N,]
:param class_mask: Target values mask [N,]
:return:
"""
z = sample("z",dist.Normal(torch.zeros((2,5)),torch.ones((2,5))).to_event(2))
logits = torch.Tensor([[[10,2,3],[8,2,1],[3,6,1]],
[[1,2,7],[0,2,1],[2,7,8]]])
aa = sample("x",dist.Categorical(logits= logits),obs=x)
with pyro.poutine.mask(mask=class_mask):
c = sample("c", dist.Categorical(logits=torch.Tensor([[3, 5], [10, 8]])).to_event(1), obs=x_class)
return z,c,aa
def model2(x,obs_mask,x_class,class_mask):
"""
:param x: Data [N,L,feat_dim]
:param obs_mask: Data sites to mask [N,L]
:param x_class: Target values [N,]
:param class_mask: Target values mask [N,]
:return:
"""
z = sample("z",dist.Normal(torch.zeros((2,5)),torch.ones((2,5))).to_event(2))
logits = torch.Tensor([[[10,2,3],[8,2,1],[3,6,1]],
[[1,2,7],[0,2,1],[2,7,8]]])
aa = sample("x",dist.Categorical(logits= logits).mask(obs_mask).to_event(1),obs=x)
c = sample("c", dist.Categorical(logits=torch.Tensor([[3, 5], [10, 8]])).to_event(1), obs=x_class)
return z,c
def model3(x,obs_mask,x_class,class_mask):
"""
:param x: Data [N,L,feat_dim]
:param obs_mask: Data sites to mask [N,L]
:param x_class: Target values [N,]
:param class_mask: Target values mask [N,]
:return:
"""
z = sample("z",dist.Normal(torch.zeros((2,5)),torch.ones((2,5))).to_event(2))
logits = torch.Tensor([[[10,2,3],[8,2,1],[3,6,1]],
[[1,2,7],[0,2,1],[2,7,8]]])
aa = sample("x",dist.Categorical(logits= logits).to_event(1),obs=x)
c = sample("c", dist.Categorical(logits=torch.Tensor([[3, 5], [10, 8]])).to_event(1), obs=x_class)
return z,c,aa
def model4(x,obs_mask,x_class,class_mask):
"""
:param x: Data [N,L,feat_dim]
:param obs_mask: Data sites to mask [N,L]
:param x_class: Target values [N,]
:param class_mask: Target values mask [N,]
:return:
"""
z = sample("z",dist.Normal(torch.zeros((2,5)),torch.ones((2,5))).to_event(2))
logits = torch.Tensor([[[10,2,3],[8,2,1],[3,6,1]],
[[1,2,7],[0,2,1],[2,7,8]]])
aa = sample("x",dist.Categorical(logits= logits),obs=x,obs_mask=obs_mask) #partial observations is what i am looking for here
c = sample("c", dist.Categorical(logits=torch.Tensor([[3, 5], [10, 8]])).mask(class_mask), obs=x_class) #in the fully supervised approach no mask here, but in the semi-supervised i would need to mask fully some observations
return z,c,aa
def model5(x,obs_mask,x_class,class_mask):
"""
:param x: Data [N,L,feat_dim]
:param obs_mask: Data sites to mask [N,L]
:param x_class: Target values [N,]
:param class_mask: Target values mask [N,]
:return:
"""
with pyro.plate("plate_batch",dim=-1):
z = sample("z",dist.Normal(torch.zeros((2,5)),torch.ones((2,5))).to_event(1))
logits = torch.Tensor([[[10,2,3],[8,2,1],[3,6,1]],
[[1,2,7],[0,2,1],[2,7,8]]])
aa = sample("x",dist.Categorical(logits= logits),obs=x,obs_mask=obs_mask) #partial observations is what i am looking for here
c = sample("c", dist.Categorical(logits=torch.Tensor([[3, 5], [10, 8]])).mask(class_mask), obs=x_class)
return z,c,aa
def guide(x,obs_mask,x_class,class_mask):
"""
:param x: Data [N,L,feat_dim]
:param obs_mask: Data sites to mask [N,L]
:param x_class: Target values [N,]
:param class_mask: Target values mask [N,]
"""
z = sample("z",dist.Normal(torch.zeros((2,5)),torch.ones((2,5))).to_event(2))
return z
if __name__ == "__main__":
pyro.enable_validation(False)
x = tensor([[0,2,1],
[0,1,1]])
obs_mask = tensor([[1,0,0],[1,1,0]],dtype=bool) #Partial observations
x_class = tensor([0,1])
class_mask = tensor([True,False],dtype=bool) #keep/skip some observations
models_dict = {"model1":model1,
"model2":model2,
"model3":model3,
"model4":model4,
"model5":model5,
}
for model in models_dict.keys():
print("Using {}".format(model))
guide_tr = poutine.trace(guide).get_trace(x,obs_mask,x_class,class_mask)
model_tr = poutine.trace(poutine.replay(models_dict[model], trace=guide_tr)).get_trace(x,obs_mask,x_class,class_mask)
monte_carlo_elbo = model_tr.log_prob_sum() - guide_tr.log_prob_sum()
print("MC ELBO estimate: {}".format(monte_carlo_elbo))
try:
pyro.clear_param_store()
svi = SVI(models_dict[model],guide,loss=Trace_ELBO(),optim=ClippedAdam(dict()))
svi.step(x,obs_mask,x_class,class_mask)
print("Test passed")
except:
print("Test failed") By the way, I think I want something like |
Your last example is different from the previous one. Now your logits has shape
I'm not sure if there is an issue here.
will give you a distribution/log_prob with batch_shape = mask.shape, and event_shape (N,). Hope that this clarifies the semantics of
which is the same as
In other words, you are scaling the log likelihood by a factor |
@fehiepsi Oh, ok , I see where the misunderstanding with the I need to have a fresh mind to reflect about the last part. Cause that would mean that I accidentally scaled up the likelihood and therefore made the training more efficient? That is so interesting Then, I want to do the same with the variable "x" hahaha (but keeping the partial observations) |
Hi!
As discussed here https://forum.pyro.ai/t/more-doubts-on-masking-runnable-example/5044/6 and here https://forum.pyro.ai/t/vae-classification/5017/10, things might not be very clear on when and how to use the different masking options. Especially in defining differences in masking usage on the model vs guide. Or masking with enumeration
Thanks! :)
The text was updated successfully, but these errors were encountered: