Skip to content

Add ananke graph export functionality #63

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Mar 14, 2023

Conversation

jaron-lee
Copy link
Collaborator

@jaron-lee jaron-lee commented Mar 9, 2023

Fixes #57 for the ananke-causal library

Changes proposed in this pull request:

  • Implements an export and import API for ananke-causal graphs

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.

Signed-off-by: Jaron Lee <jaron2005@gmail.com>
@jaron-lee jaron-lee changed the title add ananke files with commented tigramite dependency Add ananke graph export functionality Mar 9, 2023
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
@jaron-lee
Copy link
Collaborator Author

Did we reach a consensus of whether we should have a pywhy_nx.DAG class? I feel like while it would duplicate some nx functionality, for the sake of consistency it would be nice to have.

A related issue is that in Ananke we support all combinations of directed, bidirected, and undirected graphs. However, while we represent undirected and bidirected graphs through an nx.Graph, these don't behave the same way in downstream algorithms (e.g. m-separation works differently in these graphs) so we would need to wrap these in MixedEdgeGraphs. Down the line it would be nice to provide classes directly for some of these graphs rather than calling MixedEdgeGraph directly.

…d edges

Signed-off-by: Jaron Lee <jaron2005@gmail.com>
@adam2392
Copy link
Collaborator

adam2392 commented Mar 9, 2023

Did we reach a consensus of whether we should have a pywhy_nx.DAG class? I feel like while it would duplicate some nx functionality, for the sake of consistency it would be nice to have.

For now I'd say let's just use networkx until we integrate pywhy graphs with some downstream functionality.

A related issue is that in Ananke we support all combinations of directed, bidirected, and undirected graphs. However, while we represent undirected and bidirected graphs through an nx.Graph, these don't behave the same way in downstream algorithms (e.g. m-separation works differently in these graphs) so we would need to wrap these in MixedEdgeGraphs. Down the line it would be nice to provide classes directly for some of these graphs rather than calling MixedEdgeGraph directly.

WDYM here? Can you clarify?

The purpose of export is to convert other graph packages to ours and vice versa. When in ours they should work nicely with all the algorithms.

@jaron-lee
Copy link
Collaborator Author

WDYM here? Can you clarify?

The purpose of export is to convert other graph packages to ours and vice versa. When in ours they should work nicely with all the algorithms.

Yea all I'm saying is that for certain graph classes like ADMG or PAG we have pywhy_graph classes for them, but e.g chain graphs (directed + undirected) currently lack a corresponding class. So I'm forced to call MixedEdgeGraph directly. At some point we should change that, although for now I think it's fine

@jaron-lee
Copy link
Collaborator Author

jaron-lee commented Mar 9, 2023

For now I'd say let's just use networkx until we integrate pywhy graphs with some downstream functionality.

OK - but e.g. in causallearn.py it looks like we are using pywhy_nx.ADMG for DAGs vs nx.DiGraph. Should we patch that up?

@adam2392
Copy link
Collaborator

adam2392 commented Mar 9, 2023

OK - but e.g. in causallearn.py it looks like we are using pywhy_nx.ADMG for DAGs vs nx.DiGraph. Should we patch that up?

Oh yeah. I think probably a good call to convert that to a nx.DiGraph for now. The reason I'm inclined to hold off is I feel like once we "use the pywhy-graphs API" more, we'll get a sense of the different shortcomings, so don't want to commit to adding a whole new class too early.

@adam2392
Copy link
Collaborator

adam2392 commented Mar 9, 2023

Yea all I'm saying is that for certain graph classes like ADMG or PAG we have pywhy_graph classes for them, but e.g chain graphs (directed + undirected) currently lack a corresponding class. So I'm forced to call MixedEdgeGraph directly. At some point we should change that, although for now I think it's fine

Ah I see. Yeah we don't have a chain graph, so maybe for now don't support the exporting.

Can you educate me a bit and specify what's the difference between a chaingraph and ADMG/PAG/CPDAG?

@jaron-lee
Copy link
Collaborator Author

Yea all I'm saying is that for certain graph classes like ADMG or PAG we have pywhy_graph classes for them, but e.g chain graphs (directed + undirected) currently lack a corresponding class. So I'm forced to call MixedEdgeGraph directly. At some point we should change that, although for now I think it's fine

Ah I see. Yeah we don't have a chain graph, so maybe for now don't support the exporting.

Can you educate me a bit and specify what's the difference between a chaingraph and ADMG/PAG/CPDAG?

Yeah so a chain graph just has directed and undirected edges. Directed edges mean what they usually mean in a DAG, and the undirected edges represent an equilibrium process between two variables. There's a condition that you can't have a partially directed cycle (must not be able to orient an undirected edge to form a directed cycle). If that is satisfied, then this graph has a factorization that basically looks like the factorization of a DAG and a Markov random field squashed together (first you factorize according to the directed edges, which gives you blocks of undirected connected components, and then you factorize each block as you would an MRF).

These actually do come up from time to time in causality so I think it might be worth adding eventually.

I ended up implementing the export functionality directly from MixedEdgeGraph (if you create a MixedEdgeGraph with only directed and undirected edges it will be recognized as a chain graph in Ananke, and be imported back as such a MixedEdgeGraph), I suppose it's probably fine to leave it in?

@adam2392
Copy link
Collaborator

adam2392 commented Mar 9, 2023

I'm not familiar with the MRF part but the rest sounds like a CPDAG. Are they the same thing?

@jaron-lee
Copy link
Collaborator Author

Graphically it looks like a CPDAG, yes. If we are just talking the graphical representation it might be fine to subclass the CPDAG object with the added partially directed cycle constraint.

I think we do want these to be distinct classes because e.g. we will have functions that might take a chain graph but not a CPDAG and vice versa, but we would need to tell them apart. Also it would be bad form to suggest to the user via typing that you could e.g. take an output from PC and then pass it to something that parameterizes a chain graph, since this makes no sense.

@adam2392
Copy link
Collaborator

adam2392 commented Mar 9, 2023

I think we do want these to be distinct classes because e.g. we will have functions that might take a chain graph but not a CPDAG and vice versa, but we would need to tell them apart. Also it would be bad form to suggest to the user via typing that you could e.g. take an output from PC and then pass it to something that parameterizes a chain graph, since this makes no sense.

I think the CPDAG implicitly has the acyicity constraint on directed edges. We just don't enforce it because it's expensive to do so. E.g. running a nx.is_directed_acyclic function every time an edge is removed/added would be bad.

So I guess another question is what algorithms downstream would a ChainGraph eat but a CPDAG wouldn't and vice versa? Sorry asking all these questions, cuz idk what a chain graph is haha (rip causality community coming up w/ these disparate things)

@jaron-lee
Copy link
Collaborator Author

jaron-lee commented Mar 9, 2023

Yeah it's not just acyclicity on the directed edges, it's like A -> B -> C, A - C is forbidden because if we oriented the undirected edge C -> A then you get a cycle. I don't think CPDAGs check for this (and it doesn't matter) but for chain graphs it turns out there is no way to provide a likelihood for such graphs.

So I guess another question is what algorithms downstream would a ChainGraph eat but a CPDAG wouldn't and vice versa? Sorry asking all these questions, cuz idk what a chain graph is haha (rip causality community coming up w/ these disparate things)
[EDIT improved my answer]
Yeah one simple example is that a chain graph defines a model (we can give it a likelihood) so you can do things like score it or parameterize it to generate data from it. You can score CPDAGs too but that works by picking a DAG from the equivalence class so you would get different numbers in general. So here's an example where you can have the same dataset, the graphs look the same, but depending on whether it's a chain graph or CPDAG you will get different answers

@adam2392
Copy link
Collaborator

adam2392 commented Mar 9, 2023

I think a CPDAG also implicitly has the no almost directed cycle tho right? E.g. A -> B -> C with A -- C is not possible based on R2 of Meek's rules. I agree tho we do not "check" this if someone constructs a CPDAG from scratch.

Also, if a check of that sort of acyclicity is required, we do what networkx does and put the responsibility of checking on the user. E.g. in their algorithms section for bipartite graphs, it is on the user to verify that their graph passed in is indeed bipartite. With that said, I suppose we could "rename" all graphs, E.g. ADMG, PAG, CPDAG, but idk of a generalization where the acyclicity is not enforced. I'm guessing it's probably some form of cyclic graph that generalizes all of them and we feed those to algorithms that check for cyclicality vs acyclicty.

So I guess a solution is renaming CPDAG to a class that is compatible with both CPDAG and chain graph cuz imo it still sounds quite similar? If we need to add a likelihood, that wouldn't occur on the graph itself I'm guessing, so that should eat this "directed and undirected graph class" and check that it meets the chain graph assumptions by running a check for acylicity and almost acyclicity.

Alternatively, if there is implicitly some structure that we can enforce w/o explicitly running a possibly expensive graph operation every time edges change, then yeah I agree we have two classes: CPDAG and ChainGraph.

Happy to chat about this online if that's easier and we need to hash out some confusion.

@jaron-lee
Copy link
Collaborator Author

I think if the philosophy is to put responsibility onto the user then the "directed + bidirected" graph class makes the most sense. I like the solution of having a more generic graph class that can represent both CPDAGs and chain graphs (which is really just a rename of CPDAG to be more inclusive) because that best reflects the underlying situation - the graphical abstraction being useful in different contexts.

I'm not sure how expensive the operation is for checking no partially directed cycles but it's probably at least as expensive as checking ancestors.

I'll create an issue for this separately. In the meantime I will use CPDAG as a stand-in for chain graph.

@jaron-lee
Copy link
Collaborator Author

Also the windows and mac CI checks are failing, what was the command that I needed to run to fix that?

Signed-off-by: Jaron Lee <jaron2005@gmail.com>
@adam2392
Copy link
Collaborator

Also the windows and mac CI checks are failing, what was the command that I needed to run to fix that?

Since we want ananke as an "optional" dependency, this can be fixed by moving the import statements for ananke package into the function itself.

@adam2392
Copy link
Collaborator

I think if the philosophy is to put responsibility onto the user then the "directed + bidirected" graph class makes the most sense. I like the solution of having a more generic graph class that can represent both CPDAGs and chain graphs (which is really just a rename of CPDAG to be more inclusive) because that best reflects the underlying situation - the graphical abstraction being useful in different contexts.

Yeah from what I loosely understand, a chain graph is a generalization of CPDAG, so it would be nice if we have a good reference explaining all this. Lmk if you have one in mind and I can skim it.

@jaron-lee
Copy link
Collaborator Author

There are two kinds of chain graphs that I've seen in the literature. One is the Lauritzen, Wermuth, Friedman (LWF) chain graph (see also
while the other is the Alternative Markov Factorization (AMP) chain graph.

These are the original papers that introduce the chain graph as is commonly used in the literature.

There are some more modern papers that focus on the relevance to causal inference - e.g. http://www.di.unito.it/~radicion/tmp_firb__2009/chainGraphs.pdf, which I found easier to understand as an overview of this area.

Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
@codecov-commenter
Copy link

codecov-commenter commented Mar 13, 2023

Codecov Report

Merging #63 (36d8dfd) into main (67858e2) will increase coverage by 0.32%.
The diff coverage is 98.33%.

@@            Coverage Diff             @@
##             main      #63      +/-   ##
==========================================
+ Coverage   84.26%   84.59%   +0.32%     
==========================================
  Files          33       34       +1     
  Lines        2510     2570      +60     
  Branches      671      690      +19     
==========================================
+ Hits         2115     2174      +59     
  Misses        250      250              
- Partials      145      146       +1     
Impacted Files Coverage Δ
pywhy_graphs/export/ananke.py 98.33% <98.33%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nitpicking, but otw LGTM once these are fixed.

Co-authored-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
jaron-lee and others added 6 commits March 13, 2023 21:25
Co-authored-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Co-authored-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Co-authored-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
Signed-off-by: Jaron Lee <jaron2005@gmail.com>
@adam2392 adam2392 marked this pull request as ready for review March 14, 2023 13:33
@adam2392 adam2392 merged commit 20661f1 into py-why:main Mar 14, 2023
@adam2392
Copy link
Collaborator

Thanks @jaron-lee !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conversion API to other causal libraries' graph
3 participants