Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a "class" row factory #42

Open
deppe opened this issue Nov 17, 2022 · 14 comments
Open

Provide a "class" row factory #42

deppe opened this issue Nov 17, 2022 · 14 comments

Comments

@deppe
Copy link

deppe commented Nov 17, 2022

Is your feature request related to a problem? Please describe.
Currently, comdb2 module provides dict and namedtuple factories. It would also be nice to provide a "class" row factory which would allow rows returned as client-defined dataclasses, pydantic models, etc.

Describe the solution you'd like
See a similar concept in psycopg's class_row factory: https://www.psycopg.org/psycopg3/docs/api/rows.html#psycopg.rows.class_row
This also allows nice static type checking: https://www.psycopg.org/psycopg3/docs/advanced/typing.html#example-returning-records-as-pydantic-models

@ms2892
Copy link

ms2892 commented Jan 10, 2024

One way to design the "class" row factory would be to have a the class_row_factory as a class. This would mean it would look something like the following below


In comdb2/factories.py

class ClassRowFactory:
    def __init__(self, class_pointer: Any) -> None:
        self.class_pointer = class_pointer

    def __call__(self,col_names) -> Any:
        _raise_on_duplicate_column_names(col_names)
       def class_row(col_values):
           key_value_mapping = dict(zip(col_names, col_vals))
           resulting_class = self.class_pointer(**key_value_mapping)
           return resulting_class
       return class_row

So this will try to fit in to the existing code structure that is present in the factories.py file. The use of it will be something like

Example:
       >>> @dataclass
       >>> class ABC:
       >>>     x: int
       >>>     y: int
       .......
        >>> conn.row_factory = ClassRowFactory(ABC)
        >>> row = conn.cursor().execute("select 1 as x, 2 as y").fetchone()
        >>> print(row)
        <__main__.ABC object at 0x000001CBE9CC8790>
        >>> print(row.x)
        1 

@ms2892
Copy link

ms2892 commented Jan 11, 2024

#57

Here's the PR for the addition

@godlygeek
Copy link
Contributor

key_value_mapping = dict(zip(col_names, col_vals))
resulting_class = self.class_pointer(**key_value_mapping)

Is this the way that psycopg3 does it, with all columns names being passed to the class constructor as keyword arguments?

@ms2892
Copy link

ms2892 commented Jan 12, 2024

The column names aren't passed as a class constructor but the pointer to the particular class is passed to the constructor. To make it behave like a function (as it is shown in the existing format of comdb) I used the specified format in the __call__ method. But if you wish to see how its done in psycopg, it does approach it in a similar fashion where :

def class_row(cls: Type[T]) -> BaseRowFactory[T]:
    r"""Generate a row factory to represent rows as instances of the class `!cls`.

    The class must support every output column name as a keyword parameter.

    :param cls: The class to return for each row. It must support the fields
        returned by the query as keyword arguments.
    :rtype: `!Callable[[Cursor],` `RowMaker`\[~T]]
    """
    def class_row_(cursor: "BaseCursor[Any, Any]") -> "RowMaker[T]":
        names = _get_names(cursor)
        if names is None:
            return no_result

        def class_row__(values: Sequence[Any]) -> T:
            return cls(**dict(zip(names, values)))            < --------- This is how it is converting it to a class and returning it

        return class_row__

    return class_row_

I adopted a similar method and to make it easy to understand and read I put it in a class

@ms2892
Copy link

ms2892 commented Jan 12, 2024

the __call__ enables the class instance to be called like a function. So essentially what is being passed conn.row_factory = ClassRowFactory(ABC) is like class instance that can also behave like a reference to a function

@ms2892
Copy link

ms2892 commented Jan 12, 2024

Screenshot from 2024-01-12 00-12-13

I recommended this way because it doesn't break the format in which other factories are being used in the factories.py file.

@godlygeek
Copy link
Contributor

The column names aren't passed as a class constructor but the pointer to the particular class is passed to the constructor

Right, I understand that, I was asking about how that class is called. Does psycopg3 turn every column name into a keyword argument for that class constructor?

@ms2892
Copy link

ms2892 commented Jan 12, 2024

Does psycopg3 turn every column name into a keyword argument for that class constructor?

In its source code it does get the column names using another function and zips it to keyword argument of values.
How it gets the column names is through a cursor functionality implemented within it.

@ms2892
Copy link

ms2892 commented Jan 12, 2024

Its easy to convert a dictionary to a class object. If there is a blueprint for getting the dictionary within the factories.py then can't that dictionary factory be extended to just be passed into its respective class?

@godlygeek
Copy link
Contributor

godlygeek commented Jan 12, 2024

My concerns about a class row factory have never been about ease of implementation, they're about performance and about robustness.

On the performance side, this approach to a class row factory means that for each row a zip object gets created, and a dict (or I think actually 2 dicts, under the covers, due to the way the interpreter implements **), and an instance of the user-provided class. That's a lot of work, but maybe it's justified for the convenience.

On the robustness side, I'm wondering how useful it is to assume that column names will exactly match the parameter names of the __init__ of the class that was provided to the factory-factory's constructor.

If psycopg3 has done this the same way - with a dict being created for each row and used to provide keyword arguments to the class constructor based on the column names - that helps to set my mind at ease about both the performance and robustness aspect. That's why I'm inquiring about the approach they took.

@ms2892
Copy link

ms2892 commented Jan 12, 2024

On the robustness side, I'm wondering how useful it is to assume that column names will exactly match the parameter names of the __init__ of the class that was provided to the factory-factory's constructor.

The column names and values are returned as a dictionary

{'col1':value1,'col2':val2}

These will match with the parameters of the class specified if the parameter name and the key names (in this case the column name) will be the same

so the class should look something like

@dataclass
class ABC:
   col1: data_type1
   col2: data_type2

Just how it has been described in the original psycopg implementation

@ms2892
Copy link

ms2892 commented Jan 12, 2024

Also important thing. Datatype of the attributes of the dataclass should be the same as the datatype of the corresponding values of the dictionary.

@sarahmonod sarahmonod linked a pull request Jan 25, 2024 that will close this issue
@ms2892 ms2892 mentioned this issue Jan 25, 2024
@godlygeek
Copy link
Contributor

godlygeek commented Apr 19, 2024

I'm pretty sure I don't want to expose this feature, at least as things stand today. There's a few different things that make this make more sense for psycopg3 than it makes for Comdb2, but the biggest one is that I'm reluctant to add more code that makes static type checking difficult.

The psycopg3 Connection and Cursor classes are generic, so a type checker like mypy or pylance can catch a bug like this:

from dataclasses import dataclass

@dataclass
class Point:
    x: int
    y: int

conn = psycopg.connect(row_factory=class_row(Point))
cur = conn.cursor()
cur.execute("select 1, 2")
for point in cur.fetchall():
    print(f"x={point.x} y={point.y} z={point.z}")

The type checker knows that Connection[Point].cursor().fetchall() returns a Sequence[Point], and it knows that a Point doesn't have a z attribute, so the last line fails type analysis because it tries to access point.z.

Compare that against this example that uses python-comdb2, using the interface proposed in #57:

from dataclasses import dataclass

from comdb2.dbapi2 import connect
from comdb2.factories import ClassRowFactory

@dataclass
class Point:
    x: int
    y: int

conn = connect("mattdb")
conn.row_factory = ClassRowFactory(Point)
cur = conn.cursor()
cur.execute("select 1, 2")
for point in cur.fetchall():
    print(f"x={point.x} y={point.y} z={point.z}")

In the case of python-comdb2, our Connection and Cursor classes aren't currently generic, and so the type checker wouldn't catch the same bug - cur.fetchall() is typed as returning Sequence[Any] in the comdb2 world, and accessing the z attribute of an object of type Any isn't a type error.

It seems a bit unfair to dislike this because of an unrelated feature that python-comdb2 is lacking, except that there's an alternative way to write this code where a type checker would catch the bug. If that Comdb2 example were rewritten to this:

from dataclasses import dataclass

from comdb2.dbapi2 import connect
from comdb2.factories import dict_row_factory

@dataclass
class Point:
    x: int
    y: int

conn = connect("mattdb")
conn.row_factory = dict_row_factory
cur = conn.cursor()
cur.execute("select 1, 2")
for row in cur.fetchall():
    point = Point(**row)
    print(f"x={point.x} y={point.y} z={point.z}")

then the type checker would catch the bug. That's only one more line of code, and it makes it so that type checkers can usefully check the processing of each row. If someone wants to convert each row to a class, I'd rather they do it this way so that the type checker understands what's happening.

That's not a "no" for doing this proposed change, but I think it is a "not yet". I don't think this change makes sense unless and until our Connection and Cursor types become generic.

@godlygeek
Copy link
Contributor

And sorry for taking so long to reply here. I was hoping I'd find time to try making our Connection and Cursor classes generic, and never managed to get around to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants