-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor pretty-printing and fix latex underscores #6533
base: main
Are you sure you want to change the base?
Conversation
Forgot to mention: As a qualitative test, I also checked that the "monolithic" model in test_printing.py renders correctly in both MathJax and KaTeX. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #6533 +/- ##
==========================================
- Coverage 94.74% 85.24% -9.51%
==========================================
Files 147 147
Lines 27861 27855 -6
==========================================
- Hits 26397 23745 -2652
- Misses 1464 4110 +2646
|
pymc/printing.py
Outdated
elif "latex" in formatting: | ||
return rf"\text{{<{var_type}>}}" | ||
# Recurse | ||
return _str_for_input_var(var.owner.inputs[0], formatting=formatting) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I carried this over from the previous version.
When would we receive a DimShuffle here, and can we be sure that only the first input is relevant?
I pushed some changes. Mainly, they clarify the exceptions that are thrown, add a number of printing tests, and fix an edge-case bug or two. I've also updated the original post to reflect the latest. I won't be changing this PR any more till review. (though there are a few questions or comments that I will add to the code.) |
else: | ||
return f"<{var_type} {var_data.shape}>" | ||
elif _has_owner(var): | ||
if isinstance(var.owner.op, DimShuffle): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still have this question. Is the DimShuffle Op case still one that we need to handle? If so, let's add a test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. My understanding is that this is to retrieve the original RV that has been resized (or DimShuffle
d).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, cool. What would be an example of a model where the original RV gets resized or dimshuffled? (@ricardoV94 )
with Model() as self.model: | ||
# TODO: some variables commented out here as they're not working properly | ||
# in v4 yet (9-jul-2021), so doesn't make sense to test str/latex for them | ||
|
||
# Priors for unknown model parameters | ||
alpha = Normal("alpha", mu=0, sigma=10) | ||
b = Normal("beta", mu=0, sigma=10, size=(2,), observed=beta) | ||
# TODO why is this observed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be a huge deal but it seems curious that b/beta has been observed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I've never noticed this. I also don't think that we should have beta
to be observed? It does not seem to affect the test nor its purpose though... Curious indeed
# Simulate outcome variable | ||
Y = alpha + X.dot(beta) + np.random.randn(size) * sigma | ||
Y = alpha0 + X.dot(beta0) + np.random.randn(size) * sigma0 | ||
with Model() as self.model: | ||
# TODO: some variables commented out here as they're not working properly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still a TODO or is it outdated?
Hi @covertg, sorry for the delay in writing back and thanks for your PR :) I'm currently travelling, but I intend to review your PR within the next few days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for your PR @covertg and for your patience. It sounds like you've put in a lot of work into this, especially for a heavily utilized functionality of PyMC that is often not a focal point of development.
Overall, this submodule is due for a heavy makeover, as you can tell and have started. I provided some comments and questions but, overall, a dispatch-based approach for printing is probably going to be ultimately the best solution. Evidence for this is that there are a lot of if/else statements going on; a more general solution would be to implement a printing mechanism for each Op. That way, (I don't think that?) we would have to delineate between potentials, deterministics, etc. See some references below. I previously attempted to refactor the whole module in #6447, but I finally decided against it.
The PR is still good. Even if we decide to immediately go for a dispatched-based approach, not all effort will be lost, but I personally think that what you have is a good start. I will need to go over the code again before merging. I'm happy to bounce back some iterations if you have questions or concerns. I can even set aside time if you would like to do some pair programming on this work :)
@@ -197,7 +197,7 @@ class SymbolicRandomVariable(OpFromGraph): | |||
""" | |||
|
|||
_print_name: Tuple[str, str] = ("Unknown", "\\operatorname{Unknown}") | |||
"""Tuple of (name, latex name) used for for pretty-printing variables of this type""" | |||
"Tuple of (name, latex name) used for for pretty-printing variables of this type" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry - Refreshing on pep 257 I've realized triple quotes are probably preferred!
with Model() as self.model: | ||
# TODO: some variables commented out here as they're not working properly | ||
# in v4 yet (9-jul-2021), so doesn't make sense to test str/latex for them | ||
|
||
# Priors for unknown model parameters | ||
alpha = Normal("alpha", mu=0, sigma=10) | ||
b = Normal("beta", mu=0, sigma=10, size=(2,), observed=beta) | ||
# TODO why is this observed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I've never noticed this. I also don't think that we should have beta
to be observed? It does not seem to affect the test nor its purpose though... Curious indeed
alpha, sigma = 1, 1 | ||
beta = [1, 2.5] | ||
|
||
alpha0, sigma0, beta0 = 1, 1, [1, 2.5] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe alpha_true, sigma_true, beta_true
? This notation is also okay
# This is a bit hacky but seems like the best we got | ||
if ( | ||
hasattr(var, "str_repr") | ||
and callable(var.str_repr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if hasattr(var, "str_repr")
, it must necessarily be callable, right?
@@ -1116,7 +1116,7 @@ class _LKJCholeskyCovBaseRV(RandomVariable): | |||
ndim_supp = 1 | |||
ndims_params = [0, 0, 1] | |||
dtype = "floatX" | |||
_print_name = ("_lkjcholeskycovbase", "\\operatorname{_lkjcholeskycovbase}") | |||
_print_name = ("_lkjcholeskycovbase", r"\operatorname{\_lkjcholeskycovbase}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the addition of \
in front of _
? All the other random variables follow ("rv_name", "\\operatorname{rv_name}")
. Without the \
, do you obtain a strange looking string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, thinking about this again... the answer is probably in the title of this PR and related to issue #6508
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep exactly!
@@ -1164,7 +1164,7 @@ def rng_fn(self, rng, n, eta, D, size): | |||
# be safely resized. Because of this, we add the thin SymbolicRandomVariable wrapper | |||
class _LKJCholeskyCovRV(SymbolicRandomVariable): | |||
default_output = 1 | |||
_print_name = ("_lkjcholeskycov", "\\operatorname{_lkjcholeskycov}") | |||
_print_name = ("_lkjcholeskycov", r"\operatorname{\_lkjcholeskycov}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above
else: | ||
return f"<{var_type} {var_data.shape}>" | ||
elif _has_owner(var): | ||
if isinstance(var.owner.op, DimShuffle): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. My understanding is that this is to retrieve the original RV that has been resized (or DimShuffle
d).
warnings.warn( | ||
"The `include_params` argument has been deprecated. Future versions will always include parameter values.", | ||
FutureWarning, | ||
stacklevel=2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your point of depreciating include_params
seems valid to me. If IIRC when browsing this file a few weeks or months ago, include_params
was used to delineate whether they were printed as arguments to other distributions or not. You seem to have addressed this in _get_varname_distname_args
.
|
||
|
||
def _is_potential_or_deterministic(var: Variable) -> bool: | ||
# This is a bit hacky but seems like the best we got |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting function 😅. I never thought of checking if a variable is created via Potential or Deterministic this way. If it works, this sounds great but, as you suggest, it does look hackish...
(Previous _is_potential_or_deterministic
below for reference)
def _is_potential_or_deterministic(var: Variable) -> bool:
try:
return var.str_repr.__func__.func is str_for_potential_or_deterministic
except AttributeError:
# in case other code overrides str_repr, fallback
return False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I know I know....... 😅
I figured that it's not that far removed from the original implementation, since in the big-picture they're both inspecting whether a var's str_repr
is a functools partial that was made in a certain way.
Anyways, a dispatch-based approach or at least requiring a model object would obviate this type of approach
return text | ||
|
||
|
||
def _is_potential_or_deterministic(var: Variable) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the model object for this?
if var in model.deterministics or var in model.potentials
,
Just pass the model variable along
rv_reprs = [rv.str_repr(formatting=formatting, include_params=include_params) for rv in all_rv] | ||
rv_reprs = [rv_repr for rv_repr in rv_reprs if "TransformedDistribution()" not in rv_repr] | ||
|
||
rv_reprs = [rv.str_repr(formatting=formatting, **kwargs) for rv in all_rv] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's not the point of this PR but can we already remove the assumption that model RVs have this str_repr
attached to it?
Just call the respective function str_for_dist
or str_for_deterministic
on the RV
It would fix #6311
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I hadn't been aware of that issue. I don't think that we can quite yet remove that assumption, essentially because right now the str_repr function (str_for_model_var
in this PR, or str_for_dist
and str_for_deterministic
in mainline) doesn't know how to distinguish between deterministics and potentials.
At the moment that info is inserted using functools.partial
whenever the Model registers a new deterministic or potential. But, related to the above comment: currently we don't require that a model object is on the context stack in order to pretty-print a model variable. Requiring a model object would simplify this though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it would be fine to require a model to be on the context for the fancier pretty printing.
We can have a much more limited pretty printing for standalone variables (or none at all)
@larryshamalama of course, thank you for the review! I understand this PR ballooned a good deal and also recognize that I don't have the full big-picture vision that y'all have. My bad for missing your previous work on the dispatch-based approach. So thanks for the comments. Did my best to follow up on the specific ones that y'all have left. Moving forward -- I'd be very glad to help work on this in whatever way would best serve pymc in the long run. It'd be fun to work on that together! If I may ask, what led you to decide against the work in #6447? Also a big-picture question for us. Do we want model graphing to remain separate from model printing? This came to me when considering the question (in comments above) of, do we need to pretty-print pymc variables without a model on the context stack. If the focus is on pretty-printing a full model, it seems like graph-visualizing and model-pretty-printing could possibly be unified. But perhaps that is its own big can of worms. |
The big value of the pretty-printing is on printing the whole Model, not individual variables, so I would say that we don't need try to hard to pretty-print individual variables. |
What is this PR about?
printing.py
had developed a decent amount of duplicated code. I wanted to work on #6508 but found it difficult to parse through the logic. Through the course of understanding how the submodule works, I ended up refactoring it somewhat.This PR fixes the LaTeX underscore bug as discussed in that issue and also seeks to clean up pretty-printing. In particular,
str_for_model_var
andstr_for_potential_or_deterministic
include_params
argumentprinting.py
as well as in tests. But I could re-incorporate it if preferred.formatting
arg a bit more strictly<constant (1,2)>
Checklist
Major / Breaking Changes
include_params
to the str_repr function now raises a futurewarning.formatting
argument is now expected to be exactly either "plain" or "latex".New features
str_for_model
uses ~ and\sim
to align the model str, and it's annoying to typeset a plain tilde character in LaTeX.Bugfixes
.owner
attribute. (I'm not sure whether this would ever happen in the wild, but model-building allows for this case, so I went ahead and included it.)Documentation
Maintenance
While working on this I noticed that theI've realized that this method is necessary for latex to automagically appear when supported by ipython._repr_latex_()
methods assigned to Distributions and Models are essentially the same as theirstr_repr
functions, just with latex enforced. Nowhere in the pymc codebase uses these methods. Candidate for removal?The inputs to Potentials and Deterministics have been printed as a function-like expression, e.g.On reflection it seems pretty sensible to keep the f().Deterministic(f(arg1, ...))
. In the updated submodule, we maintain this behavior, but it would be easy to drop thef()
wrapper for this case if wanted.I also noticed that we don't have any tests for printing the model name.Added some tests involving this. This includes the case for nested models (one pm.Model() created within the context of another). I've never done this myself in pymc so could use a sanity check on if we need/want to be testing this.