Skip to content

Conversation

@DesmetQuentin
Copy link

@DesmetQuentin DesmetQuentin commented Jun 3, 2024

Hi there,

Well first of all thank you for this bigtree package, it's been helping me a lot for my own application so far. I am currently working on a subclass of Node aiming to implement weighted trees for aggregating data using weighted averages from the leaves to the roots. I am quite grateful that you have used TypeVar("T", bound=Node) most of the time. It made my coding much simpler and saved me a lot of time. However, in my current implementation, mypy complains because even though I set up node_type = WeightedNode (my child class of Node), the output of, e.g. list_to_tree is always a Node. Is there a reason why construct.py did not follow this convention of using a TypeVar("T", bound=Node)? If not, then I propose to apply the convention to this file too, through the present pull request.

Otherwise, I have some other construct function propositions if you are interested, such as a dir_to_tree building a tree from an existing folder architecture, or an enhanced version of list_to_tree which employs the same convention as the bash command mkdir with the braces. We can discuss about it further if you are interested.

Regards,
Q. Desmet

@kayjan
Copy link
Owner

kayjan commented Jun 4, 2024

Thanks for spotting this! A few amendments to the commit

  • Can keep the docstrings in argument and return type as Node as it would be clearer than having a type hint of T, and will show up properly in mkdocs as Node type
  • For arguments with Type[Node] = Node, can retain it as such (instead of Type[T] = T), as it is still true that it needs to be constructed with a class inheriting from Node, and default is Node
  • nested_dict_to_tree and some other functions I think the return type is kept to Node instead of T, did you forgot to change those?

I will review more when I get to my laptop, but on your end can you run hatch run mypy-type and make sure there is no error. Thank
you for your help and spotting this!

Edit: You mentioned about dir_to_file (I think it should be more accurately dir_to_tree since it is a tree construct method) but I’m happy to let users extract their directory to a list of strings and use the existing list_to_tree. This way they can also perform filters on their directory exampled here.

For the enhanced list_to_file idea, this seems interesting. Do you mean you have a list of string depicting a folder structure and it recursively creates the exact folder structure? What about file names? I would park this under tree export methods such as tree_to_dir instead of list_to_file.

@DesmetQuentin
Copy link
Author

DesmetQuentin commented Jun 4, 2024

Amazing responsiveness!

  • Alright. Yes, I actually mistyped the proposed functions: I meant dir_to_tree (which you already answered, and I quite agree); and list_to_tree. I come back on the later at the end of the message (let's focus on the type hints for this pull request).
  • For the Type[Node] = Node, I was simply not aware of the implications of typing's Type. But as I understand, keeping the original hint is indeed relevant.
  • For the doc, I did hesitate to change it but I also agree with you, I'll reset those parts.
  • For the nested_dict_to_tree, indeed I forgot to change the return value. Will proceed now.

Back to list_to_tree: I've coded something that might not be very rigourous (a bit rude maybe; I mean I literally run mkdir 😆) but I show it as is, so you can get the idea:

from typing import TypeVar, Union, List
import re
from os import walk as os_walk
from os.path import relpath as os_path_relpath, join as os_path_join
from pathlib import Path
from tempfile import mkdtemp
from subprocess import run as subprocess_run
from shutil import rmtree

from bigtree.node.basenode import BaseNode
from bigtree.node.node import Node
from bigtree.tree.construct import list_to_tree as bigtree_list_to_tree

from mytree.node.weightednode import WeightedNode

__all__ = [
    "dir_to_tree",
    "list_to_tree",
]

T = TypeVar("T", bound=WeightedNode)
PathType = Union[str, Path]


def _read_tree_dir(root_path: PathType) -> List[str]:
    """root_path/root_name/... (root_path excludes the root)"""

    root_path = Path(root_path)

    output_dirs = []
    for root, dirs, files in os_walk(root_path):
        for dir in dirs:
            full_path = os_path_relpath(os_path_join(root, dir), root_path)
            output_dirs.append(full_path)

    return output_dirs


def dir_to_tree(root_path: PathType) -> T:
    """root_path/root_name/... (root_path excludes the root)"""

    return bigtree_list_to_tree(
        _read_tree_dir(root_path),
        sep="/",
        duplicate_name_allowed=True,
        node_type=WeightedNode,
    )


def _expand_braces(input_string: str) -> List[str]:

    # Create a temporary directory
    temp_dir = mkdtemp()

    try:
        # Construct the mkdir -p command
        command = f"mkdir -p {os_path_join(temp_dir, input_string)}"

        # Run the command
        subprocess_run(command, shell=True, check=True)

        # Walk through the created directories and collect them
        return _read_tree_dir(temp_dir)
    finally:
        # Clean up the temporary directory
        rmtree(temp_dir)


def list_to_tree(
    paths: List[str],
    sep: str = "/",
    duplicate_name_allowed: bool = True,
    expand: bool = True,
) -> T:

    if expand:
        paths = sum([_expand_braces(path) for path in paths], [])

    return bigtree_list_to_tree(
        paths, sep=sep, duplicate_name_allowed=True, node_type=WeightedNode
    )

This way, the example of the doc could be done in one string with:

root = list_to_tree(["a/{b/{d,e/{g,h}},c/f}"])

My trees are quite big with lots of duplicate names, so it's very useful.

@kayjan
Copy link
Owner

kayjan commented Jun 4, 2024

Thanks for making the amendments so quickly 👍 . I just got on my laptop and running hatch run mypy-type gave me 50 errors and the checks using the pre-commit hooks also returned 24 errors for mypy. Do follow the contributing guideline to pass all the checks on your local laptop 😄

Attaching the mypy errors below.

bigtree/tree/construct.py:54: error: Name "TypeVar" is not defined  [name-defined]
bigtree/tree/construct.py:54: note: Did you forget to import it from "typing"? (Suggestion: "from typing import TypeVar")
bigtree/tree/construct.py:58: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:58: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:63: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:63: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:104: error: T? has no attribute "root"  [attr-defined]
bigtree/tree/construct.py:137: error: Returning Any from function declared to return T?  [no-any-return]
bigtree/tree/construct.py:141: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:141: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:145: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:145: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:200: error: T? has no attribute "root"  [attr-defined]
bigtree/tree/construct.py:210: error: Returning Any from function declared to return T?  [no-any-return]
bigtree/tree/construct.py:213: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:213: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:249: error: T? has no attribute "node_name"  [attr-defined]
bigtree/tree/construct.py:251: error: T? has no attribute "node_name"  [attr-defined]
bigtree/tree/construct.py:253: error: T? has no attribute "set_attrs"  [attr-defined]
bigtree/tree/construct.py:259: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:259: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:265: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:265: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:339: error: T? has no attribute "root"  [attr-defined]
bigtree/tree/construct.py:351: error: Returning Any from function declared to return T?  [no-any-return]
bigtree/tree/construct.py:355: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:355: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:359: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:359: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:423: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:423: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:429: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:429: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:505: error: T? has no attribute "root"  [attr-defined]
bigtree/tree/construct.py:517: error: Returning Any from function declared to return T?  [no-any-return]
bigtree/tree/construct.py:521: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:521: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:525: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:525: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:592: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:592: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:661: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:661: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:721: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:721: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:770: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:770: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:860: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:860: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:906: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:906: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:907: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:907: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:940: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:940: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:1046: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:1046: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:1143: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:1143: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:1149: error: T? has no attribute "node_name"  [attr-defined]
bigtree/tree/construct.py:1174: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:1174: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:1281: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:1281: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:1378: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:1378: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:1384: error: T? has no attribute "node_name"  [attr-defined]
bigtree/tree/construct.py:1408: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:1408: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:1465: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:1465: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:1472: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:1472: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:1477: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:1477: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:1480: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:1480: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:1482: error: Variable "bigtree.tree.construct.T" is not valid as a type  [valid-type]
bigtree/tree/construct.py:1482: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
bigtree/tree/construct.py:1502: error: T? has no attribute "set_attrs"  [attr-defined]
bigtree/tree/construct.py:1592: error: T? has no attribute "set_attrs"  [attr-defined]
bigtree/tree/construct.py:1667: error: T? has no attribute "set_attrs"  [attr-defined]
Found 50 errors in 2 files (checked 28 source files)

Actually for TypeVar, the T is to symbolize that the input type and output type must be the same type, hence would make sense to type hint add_xx_to_tree_xx functions as it accepts a tree input and returns a tree output and T type hints would fit. Whereas for functions that are xx_to_tree, it just returns T (which is too ambiguous) vs. returning Node currently (which is too restrictive and caused your mypy issues). Let me figure out what is the best way to perform return types 🤔

@DesmetQuentin
Copy link
Author

DesmetQuentin commented Jun 4, 2024

Ok, I'm very sorry for the mess. I'm going to follow the guidelines now and come back to you then.
I find the process nice and I am learning about the workflow for contributing, I'll start again the pull request properly!

@DesmetQuentin DesmetQuentin closed this by deleting the head repository Jun 4, 2024
@kayjan
Copy link
Owner

kayjan commented Jun 4, 2024

No worries, I'll also look into this. If you don't mind, can send how you are using WeightedNode and the steps to recreate the mypy error you are facing?

@kayjan
Copy link
Owner

kayjan commented Jun 4, 2024

I implemented a fix in v0.18.3, do upgrade the package with pip install --upgrade bigtree and test out if your mypy is still throwing error. Thanks for using bigtree and testing this out! Will keep your tree_to_dir as a TODO item to look into as well 😄

@kayjan kayjan mentioned this pull request Jun 5, 2024
@kayjan
Copy link
Owner

kayjan commented Jun 9, 2024

@DesmetQuentin Hi following up on the tree_to_dir idea, I think there is a function os.makedirs that is able to recursively create folders (even intermediate folders if they do not exist) without the need to send command line commands using shutil and subprocess. Then the logic for converting tree to directory structure is quite trivial and does not involve much logic.

However I see that your code has a few extra stuff such as WeightedNode and also the interpreting of curly braces notation - let me know if those use cases are worth diving into!

@DesmetQuentin
Copy link
Author

DesmetQuentin commented Jun 10, 2024

@kayjan I thank you for questioning this. I've looked a bit more into it and now give you a "report".

Topic: Just to be sure of what we are talking about, I'm talking about construct functions here, not export. I'm interested into using the unix-style brace expansion for constructing tree faster, in cases like mines where I have many duplicate names in big trees. It's essentially about an enhanced list_to_tree function.

Previouly: To do this, I initially lazily came up with applying the mkdir function, which already parses such brace-based expressions, inside a temporary directory --> retrieve the created directories --> make a tree out of it --> delete the temporary directory. This involved indeed subprocess, tempfile, shutil as well as some os functions. I agree with you that it's not very satisfactory to send command line for this.

Now: I actually found this braceexpand package which does exactly what I want, so now it simply looks like this:

from typing import TypeVar, List, Type
from braceexpand import braceexpand
from bigtree import Node, list_to_tree as bigtree_list_to_tree

T = TypeVar("T", bound=Node)

def list_to_tree(
    paths: List[str],
    sep: str = "/",
    duplicate_name_allowed: bool = True,
    expand: bool = True,
    node_type: Type[T] = Node,  # type: ignore[assignment]
) -> T:

    if expand:
        paths = sum([list(braceexpand(path)) for path in paths], [])

    return bigtree_list_to_tree(
        paths, sep=sep, duplicate_name_allowed=duplicate_name_allowed, node_type=node_type
    )

Application: In practice, I prefer to construct my tree from a file storing a unix-brace-based expression, but with break lines and comments. Very concretely here is my file:

Final/{
    {
    "Temporal variability"/  # here it is a nice comment
        {PAC,AND,SUN,SCS,SCB,IND,JAV,SAH},
    "Spatial variability"/
        {DJF,JJA}
    }/
            {CC,NSD}  # hey, modify here!
    ,
    'Mean value'/{
        "Temporal aspect"/
            {PAC,AND,SUN,SCS,SCB,IND,JAV,SAH},
        "Spatial aspect"/
            {DJF,JJA}
        }/
                MB
}

After parsing the file:

Final/{{Temporal variability/{PAC,AND,SUN,SCS,SCB,IND,JAV,SAH},Spatial variability/{DJF,JJA}}/{CC,NSD},Mean value/{Temporal aspect/{PAC,AND,SUN,SCS,SCB,IND,JAV,SAH},Spatial aspect/{DJF,JJA}}/MB}

After applying braceexpand (what I should give to list_to_tree in its current implementation):

['Final/Temporal variability/PAC/CC', 'Final/Temporal variability/PAC/NSD', 'Final/Temporal variability/AND/CC', 'Final/Temporal variability/AND/NSD', 'Final/Temporal variability/SUN/CC', 'Final/Temporal variability/SUN/NSD', 'Final/Temporal variability/SCS/CC', 'Final/Temporal variability/SCS/NSD', 'Final/Temporal variability/SCB/CC', 'Final/Temporal variability/SCB/NSD', 'Final/Temporal variability/IND/CC', 'Final/Temporal variability/IND/NSD', 'Final/Temporal variability/JAV/CC', 'Final/Temporal variability/JAV/NSD', 'Final/Temporal variability/SAH/CC', 'Final/Temporal variability/SAH/NSD', 'Final/Spatial variability/DJF/CC', 'Final/Spatial variability/DJF/NSD', 'Final/Spatial variability/JJA/CC', 'Final/Spatial variability/JJA/NSD', 'Final/Mean value/Temporal aspect/PAC/MB', 'Final/Mean value/Temporal aspect/AND/MB', 'Final/Mean value/Temporal aspect/SUN/MB', 'Final/Mean value/Temporal aspect/SCS/MB', 'Final/Mean value/Temporal aspect/SCB/MB', 'Final/Mean value/Temporal aspect/IND/MB', 'Final/Mean value/Temporal aspect/JAV/MB', 'Final/Mean value/Temporal aspect/SAH/MB', 'Final/Mean value/Spatial aspect/DJF/MB', 'Final/Mean value/Spatial aspect/JJA/MB']

And here is my hshow output:

                                                      ┌─  CC
                                  ┌─       PAC       ─┤
                                  │                   └─ NSD
                                  │                   ┌─  CC
                                  ├─       AND       ─┤
                                  │                   └─ NSD
                                  │                   ┌─  CC
                                  ├─       SUN       ─┤
                                  │                   └─ NSD
                                  │                   ┌─  CC
                                  ├─       SCS       ─┤
         ┌─ Temporal variability ─┤                   └─ NSD
         │                        │                   ┌─  CC
         │                        ├─       SCB       ─┤
         │                        │                   └─ NSD
         │                        │                   ┌─  CC
         │                        ├─       IND       ─┤
         │                        │                   └─ NSD
         │                        │                   ┌─  CC
         │                        ├─       JAV       ─┤
         │                        │                   └─ NSD
         │                        │                   ┌─  CC
         │                        └─       SAH       ─┤
─ Final ─┤                                            └─ NSD
         │                                            ┌─  CC
         │                        ┌─       DJF       ─┤
         ├─ Spatial variability  ─┤                   └─ NSD
         │                        │                   ┌─  CC
         │                        └─       JJA       ─┤
         │                                            └─ NSD
         │                                            ┌─ PAC ─── MB
         │                                            ├─ AND ─── MB
         │                                            ├─ SUN ─── MB
         │                        ┌─ Temporal aspect ─┼─ SCS ─── MB
         │                        │                   ├─ SCB ─── MB
         │                        │                   ├─ IND ─── MB
         └─      Mean value      ─┤                   ├─ JAV ─── MB
                                  │                   └─ SAH ─── MB
                                  │                   ┌─ DJF ─── MB
                                  └─  Spatial aspect ─┤
                                                      └─ JJA ─── MB

Then? After this, I apply my weights and stuff for my own study case (still in development), but you got the idea. I don't know if you want to include this inside bigtree or if you want to let it up to users. This is up to you! Whatever you choose, I'll eventually do that on my side. I just think that the format of the strings within the lists of list_to_tree makes it very inviting to insert braces 😃

@kayjan
Copy link
Owner

kayjan commented Jun 10, 2024

The braceexpand package seems to make the code much cleaner and integrates well with list_to_tree (also without the need to create file directories)!

Is this braces notation a formal type of tree representation? There is another form of tree representation with Newick which is similar but uses a different type of bracketing, and also not as "automated" as yours in the sense where /<something> means <something> is added to all the parent at one go which is pretty neat.

@DesmetQuentin
Copy link
Author

I don't know if it was thought for making trees, but it can be used for it. In a unix terminal, mkdir -p a/{b,c}/d would create a folder a containing two folders b and c, each containing a d folder. I think it's very powerful but I did not invent anything! You can check bash's brace expansion doc for reference.
I did not know about Newick but it's something similar. I personally prefer to build a tree from the root though (Newick starts from the leaves).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants