Skip to content

Custom parsers not supported when reading? #801

Description

@lentinj

Hello!

According to the docs, it sounds like I should be able to give in a parser dict when creating a tree:

ete/ete4/core/tree.pyx

Lines 47 to 50 in 75f2c62

:param parser: A description of how to parse a newick to
create a tree. It can be a single number specifying the
format or a structure with a fine-grained description of
how to interpret nodes (see ``newick.pyx``).

More generally, ``parser`` can be a dictionary that specifies in
detail how to read/write each field. It must say, for leaf and internal
nodes, what ``p0:p1`` means (which properties they are, including how
to read and write them). For example, the default parser looks like::
PARSER_DEFAULT = {
'leaf': [NAME, DIST], # ((name:dist)x:y);
'internal': [SUPPORT, DIST], # ((x:y)support:dist);
}

But if I hand in such a dict, extract_data_parser falls over:

>>> import ete4
>>> import ete4.parser.newick
>>> ete4.parser.newick.PARSER_DEFAULT
{'leaf': [{'pname': 'name', 'read': <cyfunction unquote at 0x7fccfb52fe80>, 'write': <cyfunction quote at 0x7fccfe4c8b80>}, {'pname': 'dist', 'read': <class 'float'>, 'write': <cyfunction <lambda> at 0x7fccfb52ff40>}], 'internal': [{'pname': 'support', 'read': <class 'float'>, 'write': <cyfunction <lambda> at 0x7fccfb5e4040>}, {'pname': 'dist', 'read': <class 'float'>, 'write': <cyfunction <lambda> at 0x7fccfb52ff40>}]}
>>> ete4.Tree("(a);", parser=ete4.parser.newick.PARSER_DEFAULT)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "ete4/core/tree.pyx", line 78, in ete4.core.tree.Tree.__init__
  File ".venv/lib/python3.10/site-packages/ete4/parser/extract.py", line 39, in extract_data_parser
    if (parser == 'newick' or parser in newick.PARSERS or
TypeError: unhashable type: 'dict'

Looks like the conditions in extract.py are the wrong way around, the following fixes things:

diff --git a/ete4/parser/extract.py b/ete4/parser/extract.py
index a7575985..74cde6c5 100644
--- a/ete4/parser/extract.py
+++ b/ete4/parser/extract.py
@@ -36,8 +36,8 @@ def extract_data_parser(data, parser):
     elif force == 'data':  # data is just raw data, not a path
         pass
     else:  # guess if it is a path to a file depending on data and format
-        if (parser == 'newick' or parser in newick.PARSERS or
-            type(parser) is dict):  # for newick format
+        if (type(parser) is dict or parser == 'newick' or
+            parser in newick.PARSERS):  # for newick format
             if (not data.lstrip('\n').startswith('(') and
                 not data.rstrip().endswith(';')):
                 data = open(data).read()  # probably a file name - open it

I can submit a pull request if that'd be useful, do you also want a test along the lines of the above, if so where?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions