-
Notifications
You must be signed in to change notification settings - Fork 8
Add ananke graph export functionality #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Did we reach a consensus of whether we should have a pywhy_nx.DAG class? I feel like while it would duplicate some nx functionality, for the sake of consistency it would be nice to have. A related issue is that in Ananke we support all combinations of directed, bidirected, and undirected graphs. However, while we represent undirected and bidirected graphs through an nx.Graph, these don't behave the same way in downstream algorithms (e.g. m-separation works differently in these graphs) so we would need to wrap these in MixedEdgeGraphs. Down the line it would be nice to provide classes directly for some of these graphs rather than calling MixedEdgeGraph directly. |
…d edges Signed-off-by: Jaron Lee <jaron2005@gmail.com>
For now I'd say let's just use networkx until we integrate pywhy graphs with some downstream functionality.
WDYM here? Can you clarify? The purpose of export is to convert other graph packages to ours and vice versa. When in ours they should work nicely with all the algorithms. |
Yea all I'm saying is that for certain graph classes like |
OK - but e.g. in causallearn.py it looks like we are using pywhy_nx.ADMG for DAGs vs nx.DiGraph. Should we patch that up? |
Oh yeah. I think probably a good call to convert that to a nx.DiGraph for now. The reason I'm inclined to hold off is I feel like once we "use the pywhy-graphs API" more, we'll get a sense of the different shortcomings, so don't want to commit to adding a whole new class too early. |
Ah I see. Yeah we don't have a chain graph, so maybe for now don't support the exporting. Can you educate me a bit and specify what's the difference between a chaingraph and ADMG/PAG/CPDAG? |
Yeah so a chain graph just has directed and undirected edges. Directed edges mean what they usually mean in a DAG, and the undirected edges represent an equilibrium process between two variables. There's a condition that you can't have a partially directed cycle (must not be able to orient an undirected edge to form a directed cycle). If that is satisfied, then this graph has a factorization that basically looks like the factorization of a DAG and a Markov random field squashed together (first you factorize according to the directed edges, which gives you blocks of undirected connected components, and then you factorize each block as you would an MRF). These actually do come up from time to time in causality so I think it might be worth adding eventually. I ended up implementing the export functionality directly from MixedEdgeGraph (if you create a MixedEdgeGraph with only directed and undirected edges it will be recognized as a chain graph in Ananke, and be imported back as such a MixedEdgeGraph), I suppose it's probably fine to leave it in? |
I'm not familiar with the MRF part but the rest sounds like a CPDAG. Are they the same thing? |
Graphically it looks like a CPDAG, yes. If we are just talking the graphical representation it might be fine to subclass the CPDAG object with the added partially directed cycle constraint. I think we do want these to be distinct classes because e.g. we will have functions that might take a chain graph but not a CPDAG and vice versa, but we would need to tell them apart. Also it would be bad form to suggest to the user via typing that you could e.g. take an output from PC and then pass it to something that parameterizes a chain graph, since this makes no sense. |
I think the CPDAG implicitly has the acyicity constraint on directed edges. We just don't enforce it because it's expensive to do so. E.g. running a So I guess another question is what algorithms downstream would a ChainGraph eat but a CPDAG wouldn't and vice versa? Sorry asking all these questions, cuz idk what a chain graph is haha (rip causality community coming up w/ these disparate things) |
Yeah it's not just acyclicity on the directed edges, it's like A -> B -> C, A - C is forbidden because if we oriented the undirected edge C -> A then you get a cycle. I don't think CPDAGs check for this (and it doesn't matter) but for chain graphs it turns out there is no way to provide a likelihood for such graphs.
|
I think a CPDAG also implicitly has the no almost directed cycle tho right? E.g. Also, if a check of that sort of acyclicity is required, we do what networkx does and put the responsibility of checking on the user. E.g. in their algorithms section for bipartite graphs, it is on the user to verify that their graph passed in is indeed bipartite. With that said, I suppose we could "rename" all graphs, E.g. ADMG, PAG, CPDAG, but idk of a generalization where the acyclicity is not enforced. I'm guessing it's probably some form of cyclic graph that generalizes all of them and we feed those to algorithms that check for cyclicality vs acyclicty. So I guess a solution is renaming CPDAG to a class that is compatible with both CPDAG and chain graph cuz imo it still sounds quite similar? If we need to add a likelihood, that wouldn't occur on the graph itself I'm guessing, so that should eat this "directed and undirected graph class" and check that it meets the chain graph assumptions by running a check for acylicity and almost acyclicity. Alternatively, if there is implicitly some structure that we can enforce w/o explicitly running a possibly expensive graph operation every time edges change, then yeah I agree we have two classes: CPDAG and ChainGraph. Happy to chat about this online if that's easier and we need to hash out some confusion. |
I think if the philosophy is to put responsibility onto the user then the "directed + bidirected" graph class makes the most sense. I like the solution of having a more generic graph class that can represent both CPDAGs and chain graphs (which is really just a rename of CPDAG to be more inclusive) because that best reflects the underlying situation - the graphical abstraction being useful in different contexts. I'm not sure how expensive the operation is for checking no partially directed cycles but it's probably at least as expensive as checking ancestors. I'll create an issue for this separately. In the meantime I will use CPDAG as a stand-in for chain graph. |
Also the windows and mac CI checks are failing, what was the command that I needed to run to fix that? |
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Since we want ananke as an "optional" dependency, this can be fixed by moving the import statements for ananke package into the function itself. |
Yeah from what I loosely understand, a chain graph is a generalization of CPDAG, so it would be nice if we have a good reference explaining all this. Lmk if you have one in mind and I can skim it. |
There are two kinds of chain graphs that I've seen in the literature. One is the Lauritzen, Wermuth, Friedman (LWF) chain graph (see also These are the original papers that introduce the chain graph as is commonly used in the literature. There are some more modern papers that focus on the relevance to causal inference - e.g. http://www.di.unito.it/~radicion/tmp_firb__2009/chainGraphs.pdf, which I found easier to understand as an overview of this area. |
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Codecov Report
@@ Coverage Diff @@
## main #63 +/- ##
==========================================
+ Coverage 84.26% 84.59% +0.32%
==========================================
Files 33 34 +1
Lines 2510 2570 +60
Branches 671 690 +19
==========================================
+ Hits 2115 2174 +59
Misses 250 250
- Partials 145 146 +1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor nitpicking, but otw LGTM once these are fixed.
Co-authored-by: Adam Li <adam2392@gmail.com> Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Co-authored-by: Adam Li <adam2392@gmail.com> Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Co-authored-by: Adam Li <adam2392@gmail.com> Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Co-authored-by: Adam Li <adam2392@gmail.com> Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
… into new_export_ananke
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Thanks @jaron-lee ! |
Fixes #57 for the
ananke-causal
libraryChanges proposed in this pull request:
ananke-causal
graphsBefore submitting
section of the
CONTRIBUTING
docs.Writing docstrings section of the
CONTRIBUTING
docs.After submitting