Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First class Cockroach DB support. #607

Closed
6 tasks done
gnat opened this issue Aug 30, 2022 · 74 comments
Closed
6 tasks done

First class Cockroach DB support. #607

gnat opened this issue Aug 30, 2022 · 74 comments

Comments

@gnat
Copy link
Member

gnat commented Aug 30, 2022

Friction-free Cockroach DB support in Piccolo ORM.

PR: #608

We'd like to see Cockroach Labs promote Piccolo as one of, if not the recommended async ORM. asyncpg has some support built-in to help us get there.

In the mean time I will be adding notes on how to use CockroachDB with Piccolo, and general progress.

Quick local setup for testing

  • Download/extract (single binary): https://www.cockroachlabs.com/docs/stable/install-cockroachdb-linux.html
  • Start Cockroach: ./cockroach start-single-node --insecure --store=node1 --listen-addr=localhost:26257 --http-addr=localhost:8080 --background
  • Enter SQL client: ./cockroach sql --insecure
    • Can create database ...; use ...; from here.
  • CTRL-C or killall cockroach to send graceful termination signal.

Default account is root without password. Fancy stats dashboard at http://localhost:8080

Both Piccolo Examples and Piccolo Admin are working.

image

What we still need

🐛 Known issues in CockroachDB or Asyncpg

[ ] ARRAY support between asyncpg ↔️ cockroach.

cannot drop UNIQUE constraint "..." using ALTER TABLE DROP CONSTRAINT, use DROP INDEX CASCADE instead

column ... is of type int[] and thus is not indexable / column ... is of type varchar[] and thus is not indexable

🌔 Future nice to haves

CockroachDB executes ALTER COLUMN queries asynchronously.

  • In the short term this can be mitigated by keeping migrations small.
  • At odds with Piccolo's assumption that the database is altered before the next migration operation begins. CockroachDB will give a warning: table <...> is currently undergoing a schema change if a later operation tries to modify the table before the asynchronous query finishes. CockroachDB may fix this in the future.
@gnat
Copy link
Member Author

gnat commented Aug 30, 2022

✔️ Historical Issues / Solutions of Note

✔️ All issues are resolved in #608 but are here if you need to apply them manually.

✔️ 🐛 expected DEFAULT expression to have type timestamp, but 'current_timestamp() + '01:00:00'' has type timestamptz

return f"CURRENT_TIMESTAMP + INTERVAL '{interval_string}'"

return f"CURRENT_TIMESTAMP::timestamp + INTERVAL '{interval_string}'"

return "current_timestamp"

return "current_timestamp::timestamp"

✔️ 🐛 asyncpg.exceptions.FeatureNotSupportedError: unimplemented: extension "uuid-ossp" is not yet supported

extensions = ["uuid-ossp"]

#extensions = ["uuid-ossp"]

Or add extensions=[] in piccolo_config.py when creating DB = PostgresEngine(...

✔️ 🐛 ALTER COLUMN TYPE is not supported inside a transaction

SET CLUSTER SETTING sql.defaults.experimental_alter_column_type.enabled = true;

Same strategy employed by django-cockroach

@dantownsend
Copy link
Member

No worries, we'll use this issue instead as it's more detailed.

@sinisaos
Copy link
Member

I experimented a bit and researched about this.

Tests / CI

  • Add Cockroach to the Piccolo QA tooling.

For Github actions only found an example of how Peewee ORM does it (basically downloading CockroachDB, starting with start-single-node and creating a test database using CockroachDB sql - same as a local installation but I don't know how it works in practice). Peewee also has special field-types for CockroachDB specifics (primary key and uuid).

I was able to run the tests locally using PostgresEngine and created a few abstract methods e.g

# in columns/base.py
class Default(ABC):
    @abstractproperty
    def postgres(self) -> str:
        pass

    @abstractproperty
    def sqlite(self) -> str:
        pass

    @abstractproperty
    def cockroach(self) -> str:
        pass

# and then in columns/timestamp.py
class TimestampNow(Default):
    @property
    def postgres(self):
        return "current_timestamp"

    @property
    def sqlite(self):
        return "current_timestamp"

    @property
    def cockroach(self):
        return "current_timestamp::timestamp"

for the CockroachDB specifics (because without that Postgres tests are failing). The result is that a lot of the tests failed (DuplicateTableError, InterfaceError etc.) and there is also a few errors. Also all tests that return a serial ID in the response fail because CockroachDB has a different primary key implementation.

test_result

Tests run 10 times slower than postgres (even with start-single-node) or I don't know how to fine tune it (I use this command to start single node cockroach start-single-node --insecure --store=node1 --store=type=mem,size=0.5 --listen-addr=localhost:26257 on my local machine (4 cores - 4GB RAM + 2GB swap)). After the tests were completed, I had to use DROP DATABASE piccolo CASCADE and again CREATE DATABASE piccolo because the database is not cleaned after the test suite is finished?

Sorry for the long comment, I hope this is useful.

@gnat
Copy link
Member Author

gnat commented Aug 31, 2022

For reference, in my experience, django-cockroach is also a fairly fleshed out postresql ➡️ cockroachdb layer, and may also serve as a good reference point for common solutions.

Peewee also has special field-types for CockroachDB specifics (primary key and uuid)

Yeah I do believe this is to set the correct default value (Ex: DEFAULT unique_rowid() and DEFAULT gen_random_uuid()). Here it is in django-cockroach: https://github.com/cockroachdb/django-cockroachdb/blob/0d44aaed3b414e263ee9f9ffb7ed71f585fdb9a1/django_cockroachdb/base.py#L34

@sinisaos
Copy link
Member

I think we could do something like this, but Daniel would know better. First create new CockroachEngine engine

from __future__ import annotations
import typing as t
from piccolo.engine.postgres import PostgresEngine

class CockroachEngine(PostgresEngine):
    def __init__(
        self,
        config: t.Dict[str, t.Any],
        extensions: t.Sequence[str] = (),
    ) -> None:
        self.config = config
        self.extensions = extensions
        super().__init__(config=config, extensions=extensions)
       
        engine_type = "cockroach"
        ...

and then change Serial column to this

class Serial(Column):
    """
    An alias to an autoincrementing integer column in Postgres.
    """

    @property
    def column_type(self):
        engine_type = self._meta.engine_type
        if engine_type == "postgres":
            return "SERIAL"
        elif engine_type == "sqlite":
            return "INTEGER"
        elif engine_type == "cockroach":
            return "DEFAULT unique_rowid()"
        raise Exception("Unrecognized engine type")

And then do the same for UUID column.

I have no experience with CockroachDB, but do you know why the tests take so long? I think this is a problem for GH actions.

@dantownsend
Copy link
Member

@sinisaos Yeah, something like that will work.

This needs changing too:

@property
def querystrings(self) -> t.Sequence[QueryString]:
"""
Calls the correct underlying method, depending on the current engine.
"""
if self._frozen_querystrings is not None:
return self._frozen_querystrings
engine_type = self.engine_type
if engine_type == "postgres":
try:
return self.postgres_querystrings
except NotImplementedError:
return self.default_querystrings
elif engine_type == "sqlite":
try:
return self.sqlite_querystrings
except NotImplementedError:
return self.default_querystrings
else:
raise Exception(
f"No querystring found for the {engine_type} engine."
)

So the logic for cockroach is:

  • If cockroach query exists, then use it
  • Otherwise, if postgres query exists, then use it
  • Otherwise, use default query

@gnat
Copy link
Member Author

gnat commented Aug 31, 2022

--store=type=mem,size=0.5

Would bump that to at least 2GB. CockroachDB tends to choke up on anything less.

Tests are running fairly quickly here, but I'm currently hanging on tests/table/instance/test_save.py

@gnat
Copy link
Member Author

gnat commented Aug 31, 2022

On the bright side the tests are running here in 2:13

image
Currently running into this issue of Cockroach sending back cannot perform operation: another operation is in progress. Think this may be because Cockroach is performing operations asynchronously and we aren't retrying.

EDIT: Actually, probably running into the "multiple active portals" issue: MagicStack/asyncpg#738
MagicStack/asyncpg#580
cockroachdb/cockroach#83164

@sinisaos
Copy link
Member

sinisaos commented Aug 31, 2022

@dantownsend You're right. I just gave an example of what we could do. Maybe you should install CockroachDB when you have time and try to run the tests.
@gnat You're right. If I run with just cockroach start-single-node --insecure --store=node1 result is much better - 8 min but that's probably because of my old and slow laptop :)

@gnat
Copy link
Member Author

gnat commented Aug 31, 2022

I currently "work around" the cannot perform operation: another operation is in progress in my own internal ORM by replacing fetchval with fetch-- but I do not see fetchval being used in Piccolo.. so it's something else triggering the "multiple active portals" issue..

@gnat
Copy link
Member Author

gnat commented Aug 31, 2022

@dantownsend any ideas?

gnat added a commit to gnat/piccolo that referenced this issue Aug 31, 2022
@dantownsend
Copy link
Member

dantownsend commented Aug 31, 2022

@gnat I browsed some issues on cockroachdb (like this cockroachdb/cockroach#40195). I don't have a definite answer, just some ideas:

  • Could be something to do with transactions.
  • Or maybe Piccolo isn't closing a connection somewhere.
  • A bug with asyncpg?

I need to play around with it a bit more.

@gnat
Copy link
Member Author

gnat commented Aug 31, 2022

So, some tests are problematic for Cockroach (anything testing sequential keys starting at 1), because even though we can run Cockroach in "compatibility mode", most Cockroach users will never want to do this because it's terrible for performance in Cockroach (the cluster would have to perform a very expensive "full cluster" consensus on INSERT and essentially have a cluster-wide lock 😱)

Notably "id": 1 and "manager": 1 here:

band = Band(name="Pythonistas", popularity=1000, manager=manager)
band.save().run_sync()

self.assertEqual(
    Band.select().run_sync(),
    [
        {
            "id": 1,
            "name": "Pythonistas",
            "manager": 1,
            "popularity": 1000,
        }
    ],
)

So we have a few options:

Add to the current test suite.

  • Perhaps some decorators like @cockroach_only and @cockroach_never (needed to exclude tests guaranteed to fail), and have extra tests just for Cockroach.
  • I think this is the way to go, but our test suite will get bigger.
  • Any cleaner ways to approach this?

Run compatibility mode.

  • Add DEFAULT nextval() for Serial at Table creation time just for Cockroach tests.
  • Really not sure how to do this just at Table creation time without the new DEFAULT being used everywhere else and breaking other tests. @dantownsend
  • As explained above, most people are going to want DEFAULT unique_rowid() or UUID, but neither of these will pass the current test suite.
  • Explained here: https://www.cockroachlabs.com/docs/stable/serial.html

Remove sequential keys from the current assert checks.

  • Kinda easy but not very desirable. Mangles current tests.

Ignore large swaths of tests for Cockroach.

  • Easiest but not very desirable.

@gnat
Copy link
Member Author

gnat commented Aug 31, 2022

Major work in progress, but figured I'd start pushing up progress so people can try things out as we go.

@dantownsend
Copy link
Member

Notably "id": 1 and "manager": 1 here:

Yes, that's a tricky problem.

We currently have an approach for just running certain tests for certain databases:

piccolo/tests/base.py

Lines 33 to 39 in ed93feb

postgres_only = pytest.mark.skipif(
not is_running_postgres(), reason="Only running for Postgres"
)
sqlite_only = pytest.mark.skipif(
not is_running_sqlite(), reason="Only running for SQLite"
)

We can create a cockroach_only decorator. It's might not be sufficient though, if we have loads of tests which don't run on Cockroach - we would also need to mark a bunch of tests as not_cockroach. A cleaner approach might be to create a new decorator called for_engines. For example:

@for_engines('sqlite', 'postgres')
class Test1(TestCase):
    ...

But as you said, ignoring large numbers of tests isn't ideal.

Run compatibility mode.
Add DEFAULT nextval() for Serial at Table creation time just for Cockroach tests.
Really not sure how to do this just at Table creation time without the new DEFAULT being used everywhere else and breaking other tests. @dantownsend
As explained above, most people are going to want DEFAULT unique_rowid() or UUID, but neither of these will pass the current test suite.
Explained here: https://www.cockroachlabs.com/docs/stable/serial.html

It's a smart idea. I'm not sure how best to implement it though. Whenever we create tables in test, we could do it via a custom function which set the tables up correctly if using cockroach.

class Test1(Table):
    def setUp(self):
        create_test_tables(Band)

I need to think about it a bit more ...

@gnat
Copy link
Member Author

gnat commented Sep 1, 2022

@for_engines('sqlite', 'postgres')
class Test1(TestCase):
    ...

This is smart. Allows us to pick and choose specific tests per engine and easily add other engines in the future. Would allow us to repurpose the Cockroach tests for other massively distributed databases as well.

If you don't mind, what does the implementation of this @for_engines decorator look like? I am having trouble with it. 😅

Here's an example of extending our test suite with @for_engines would look like.

Keeps the test suite comprehensive and the division of tests are clear.

I've re-worked that test for Cockroach (it just ignores "id" and "manager") but of course, feedback and ideas on how to do it better are much appreciated.

test_save.py

from unittest import TestCase

from piccolo.table import create_db_tables_sync, drop_db_tables_sync
from tests.example_apps.music.tables import Band, Manager
from tests.base import cockroach_only, no_cockroach


class TestSave(TestCase):
    def setUp(self):
        create_db_tables_sync(Manager, Band)

    def tearDown(self):
        drop_db_tables_sync(Manager, Band)

    def test_save_new(self):
        """
        Make sure that saving a new instance works.
        """
        manager = Manager(name="Maz")

        query = manager.save()
        print(query)
        self.assertTrue("INSERT" in query.__str__())

        query.run_sync()

        names = [i["name"] for i in Manager.select(Manager.name).run_sync()]
        self.assertTrue("Maz" in names)

        manager.name = "Maz2"
        query = manager.save()
        print(query)
        self.assertTrue("UPDATE" in query.__str__())

        query.run_sync()
        names = [i["name"] for i in Manager.select(Manager.name).run_sync()]
        self.assertTrue("Maz2" in names)
        self.assertTrue("Maz" not in names)

    @for_engines('sqlite', 'postgres')
    def test_save_specific_columns(self):
        """
        Make sure that we can save a subset of columns.
        """
        manager = Manager(name="Guido")
        manager.save().run_sync()

        band = Band(name="Pythonistas", popularity=1000, manager=manager)
        band.save().run_sync()

        self.assertEqual(
            Band.select().run_sync(),
            [
                {
                    "id": 1,
                    "name": "Pythonistas",
                    "manager": 1,
                    "popularity": 1000,
                }
            ],
        )

        band.name = "Pythonistas 2"
        band.popularity = 2000
        band.save(columns=[Band.name]).run_sync()

        # Only the name should update, and not the popularity:
        self.assertEqual(
            Band.select().run_sync(),
            [
                {
                    "id": 1,
                    "name": "Pythonistas 2",
                    "manager": 1,
                    "popularity": 1000,
                }
            ],
        )

        # Also test it using strings to identify columns
        band.name = "Pythonistas 3"
        band.popularity = 3000
        band.save(columns=["popularity"]).run_sync()

        # Only the popularity should update, and not the name:
        self.assertEqual(
            Band.select().run_sync(),
            [
                {
                    "id": 1,
                    "name": "Pythonistas 2",
                    "manager": 1,
                    "popularity": 3000,
                }
            ],
        )

    @for_engines('cockroach')
    def test_save_specific_columns(self):
        """
        Make sure that we can save a subset of columns.
        """
        manager = Manager(name="Guido")
        manager.save().run_sync()

        band = Band(name="Pythonistas", popularity=1000, manager=manager)
        band.save().run_sync()

        self.assertEqual(
            Band.select().run_sync()[0] | {
                "name":"Pythonistas",
                "popularity": 1000,
            },
            Band.select().run_sync()[0]
        )

        band.name = "Pythonistas 2"
        band.popularity = 2000
        band.save(columns=[Band.name]).run_sync()

        # Only the name should update, and not the popularity:
        self.assertEqual(
            Band.select().run_sync()[0] | {
                "name":"Pythonistas 2",
                "popularity": 1000,
            },
            Band.select().run_sync()[0]
        )

        # Also test it using strings to identify columns
        band.name = "Pythonistas 3"
        band.popularity = 3000
        band.save(columns=["popularity"]).run_sync()

        # Only the popularity should update, and not the name:
        self.assertEqual(
            Band.select().run_sync()[0] | {
                "name":"Pythonistas 2",
                "popularity": 3000,
            },
            Band.select().run_sync()[0]
        )

@gnat
Copy link
Member Author

gnat commented Sep 1, 2022

Another alternative.

We can use if for_engines('sqlite', 'postgres'): and if for_engines('cockroach'): on the individual self.assert..

Direct comparison to the previous for test_save.py. Your thoughts, ideas?

from unittest import TestCase

from piccolo.table import create_db_tables_sync, drop_db_tables_sync
from tests.example_apps.music.tables import Band, Manager
from tests.base import for_engines


class TestSave(TestCase):
    def setUp(self):
        create_db_tables_sync(Manager, Band)

    def tearDown(self):
        drop_db_tables_sync(Manager, Band)

    def test_save_new(self):
        """
        Make sure that saving a new instance works.
        """
        manager = Manager(name="Maz")

        query = manager.save()
        print(query)
        self.assertTrue("INSERT" in query.__str__())

        query.run_sync()

        names = [i["name"] for i in Manager.select(Manager.name).run_sync()]
        self.assertTrue("Maz" in names)

        manager.name = "Maz2"
        query = manager.save()
        print(query)
        self.assertTrue("UPDATE" in query.__str__())

        query.run_sync()
        names = [i["name"] for i in Manager.select(Manager.name).run_sync()]
        self.assertTrue("Maz2" in names)
        self.assertTrue("Maz" not in names)

    def test_save_specific_columns(self):
        """
        Make sure that we can save a subset of columns.
        """
        manager = Manager(name="Guido")
        manager.save().run_sync()

        band = Band(name="Pythonistas", popularity=1000, manager=manager)
        band.save().run_sync()

        if for_engines('sqlite', 'postgres'):
            self.assertEqual(
                Band.select().run_sync(),
                [
                    {
                        "id": 1,
                        "name": "Pythonistas",
                        "manager": 1,
                        "popularity": 1000,
                    }
                ],
            )

        if for_engines('cockroach'):
            self.assertEqual(
                Band.select().run_sync()[0] | {
                    "name":"Pythonistas",
                    "popularity": 1000,
                },
                Band.select().run_sync()[0]
            )

        band.name = "Pythonistas 2"
        band.popularity = 2000
        band.save(columns=[Band.name]).run_sync()

        # Only the name should update, and not the popularity:
        if for_engines('sqlite', 'postgres'):
            self.assertEqual(
                Band.select().run_sync(),
                [
                    {
                        "id": 1,
                        "name": "Pythonistas 2",
                        "manager": 1,
                        "popularity": 1000,
                    }
                ],
            )

        if for_engines('cockroach'):
            self.assertEqual(
                Band.select().run_sync()[0] | {
                    "name":"Pythonistas 2",
                    "popularity": 1000,
                },
                Band.select().run_sync()[0]
            )

        # Also test it using strings to identify columns
        band.name = "Pythonistas 3"
        band.popularity = 3000
        band.save(columns=["popularity"]).run_sync()

        # Only the popularity should update, and not the name:
        if for_engines('sqlite', 'postgres'):
            self.assertEqual(
                Band.select().run_sync(),
                [
                    {
                        "id": 1,
                        "name": "Pythonistas 2",
                        "manager": 1,
                        "popularity": 3000,
                    }
                ],
            )

        if for_engines('cockroach'):
            self.assertEqual(
                Band.select().run_sync()[0] | {
                    "name":"Pythonistas 2",
                    "popularity": 3000,
                },
                Band.select().run_sync()[0]
            )

@sinisaos
Copy link
Member

sinisaos commented Sep 1, 2022

Sorry to interrupt, but we may have additional issues until asyncpg is fully supported in Cockroach. I started running individual tests and a few exceptions are currently not supported. For example, tests for array columns throw exception asyncpg.exceptions._base.UnknownPostgresError: unable to decorrelate subquery. I don't think it's implemented yet issue.

I also haven't found an alternative to the Postgres FORMAT() function used for readable columns.

@property
def postgres_string(self) -> str:
return self._get_string(operator="FORMAT")

There are probably more such situations that we haven't discovered yet.

@gnat
Copy link
Member Author

gnat commented Sep 1, 2022

Yeah I encountered that the other day as well, thought I'd come back to it as it seems non-essential, maybe @dantownsend can comment. We may be able to just stub it for cockroach for now or put in a placeholder.

I don't see many viable alternatives to postgres format() or sqlite printf() in the function reference unless I'm missing something. Odd since there's tons of other string formatting options.

@knz Any thoughts?

@dantownsend
Copy link
Member

Instead of format we might be able to use || which concatenates strings and values.

We would have to refactor this:

class Readable(Selectable):

But I think it's possible.

@dantownsend
Copy link
Member

If there are lots of problems with asyncpg, another solution is to build the CockroachDB engine based off psycopg3 instead, which seems to be highly compatible with CockRoachDB.

The engines in Piccolo aren't super complex, so it's possible.

@dantownsend
Copy link
Member

If you don't mind, what does the implementation of this @for_engines decorator look like? I am having trouble with it. 😅

I think it looks something like this:

ENGINE = engine_finder()

def for_engines(*engine_names: str):
    if ENGINE:
        current_engine_name = ENGINE.engine_type
        if current_engine_name not in engine_names:

            def wrapper(func):
                return pytest.mark.skip(
                    f"Not running for {current_engine_name}"
                )(func)

            return wrapper
        else:

            def wrapper(func):
                return func

            return wrapper
    else:
        raise ValueError("Engine not found")

@dantownsend
Copy link
Member

We can use if for_engines('sqlite', 'postgres'): and if for_engines('cockroach'): on the individual self.assert..

We could do that. We would end up with quite a bit of branching in the tests, but in the end it might be the best solution.

I'm still hopefully that there's a super clean solution we haven't thought of yet ... 🤔

@gnat
Copy link
Member Author

gnat commented Sep 1, 2022

We would end up with quite a bit of branching in the tests

This is what I'm experiencing.

Your decorator creates much flatter tests, and the functions themselves are shorter, which is nice.

Also I'm realising I'm needing to do RETURNING id on many of the INSERT, because the id is semi-random, which clutters the postgres and sqlite tests a bit too much for my taste.

Thankfully many tests pass fine unmodified (ideal situation).

@gnat
Copy link
Member Author

gnat commented Sep 1, 2022

If there are lots of problems with asyncpg, another solution is to build the CockroachDB engine based off psycopg3 instead, which seems to be highly compatible with CockRoachDB.

The engines in Piccolo aren't super complex, so it's possible.

Interesting. On that note, it seems like @dvarrazzo has committed extra time into ensuring a smooth CRDB story in psycopg: psycopg/psycopg#313

@knz
Copy link

knz commented Sep 1, 2022

@knz Any thoughts?

  • "cannot decorrelate subquery": we're extremely interested to get that fixed. CockroachDB is supposed to be able to decorrelate a lot nowadays, so your use case is probably a bug. Can you tell us more and then file an issue on the crdb repo?

  • format: see here builtins: support postgre's FORMAT function cockroachdb/cockroach#87260
    and indeed using || and the ::string cast conversion will help as workaround.

@gnat
Copy link
Member Author

gnat commented Sep 1, 2022

@dantownsend Any advice on how to handle the default type checks tests in tests/apps/migrations/auto/integration/test_migrations.py ?

Example failed test:

    def test_bigint_column(self):
        self._test_migrations(
            table_snapshots=[
                [self.table(column)]
                for column in [
                    BigInt(),
                    BigInt(default=1),
                    BigInt(default=integer_default),
                    BigInt(null=True),
                    BigInt(null=False),
                    BigInt(index=True),
                    BigInt(index=False),
                ]
            ],
            test_function=lambda x: all(
                [
                    x.data_type == "bigint",
                    x.is_nullable == "NO",
                    x.column_default == "0",
                ]
            ),
        )

Fails with:

AssertionError: False is not true : Meta is incorrect: RowMeta(column_default='0:::INT8', column_name='my_column', is_nullable='NO', table_name='my_table', character_maximum_length=None, data_type='bigint', numeric_precision=64, numeric_scale=0, numeric_precision_radix=2)

CockroachDB is adding type casting aka '0:::INT8'

Do I really need to write all new tests with the casts included?

Or is there a way to override or work around this? Suggestions appreciated.

@dantownsend
Copy link
Member

B) When using Cockroach engine, automatically map to the alternative types. Example: JSON() would just call JSONB() instead when using Cockroach.

@gnat I think this is my preferred solution at the moment too.

@gnat
Copy link
Member Author

gnat commented Sep 8, 2022

@dantownsend any good ways to get the engine type from the module scope in column_types.py as described here? #607 (comment) a bit stuck on a reasonable way to do this. Open to other methods of remapping the column types, too.

@dantownsend
Copy link
Member

dantownsend commented Sep 8, 2022

@gnat A column can find out which engine it is using via this:

self._meta.engine_type

However, this only works after the Table metaclass has done its magic. So trying to use it within a column's constructor might not work.

If you want to warn the user instead against using JSON with CockroachDB, we can add something to the metaclass:

piccolo/piccolo/table.py

Lines 264 to 265 in 7754498

if isinstance(column, (JSON, JSONB)):
json_columns.append(column)

Something like this:

if db.engine_type =='cockroachdb':
    warnings.warn("We recommend using JSONB, not JSON for CockroachDB")

Or we just add it to the docs.

We could even convert the type from JSON to JSONB in the Table metaclass, but it gets a bit complex.

@gnat
Copy link
Member Author

gnat commented Sep 10, 2022

Progress: information_schema.column directly inside cockroach sql does indeed report INT2 for smallint.

The problem is when piccolo queries information_schema.column for smallint we get back the column_default as :::INT8... hence all the confusion. But this isn't actually the type, just how we interpret the column_default inside piccolo.

On the bright side, I don't think this is actually a functional problem, and should only effect tests- it clears up some confusion. Same thing with all varchar column_default coming back as :::STRING etc.

@knz
Copy link

knz commented Sep 10, 2022

that column_default representation looks like a bug. We need to fix that inside crdb. I'll file an issue.

@knz
Copy link

knz commented Sep 10, 2022

cockroachdb/cockroach#87774

@gnat
Copy link
Member Author

gnat commented Sep 10, 2022

Nice one. You're right. Thanks for the catch @knz ! Very helpful.

@gnat
Copy link
Member Author

gnat commented Sep 12, 2022

Not sure if anyone can help out here but I wanted to leave some progress notes:

As mentioned above, Cockroach assumes SERIAL / INTEGER is INT8 unless cluster setting default_int_size is changed from 8 to 4.

So yeah while technically Cockroach is capable of storing these types as INT4 it isn't the default, and is problematic for us because apps/schema/test/generate.py assumes things map cleanly to our Integer type etc. and we're not easily able to remap Integer to BigInt in Piccolo. 😐 (and setting default_int_size to 4 actually throws a warning in Cockroach, so I'm going to assume it isn't a happy path there either. )

That said, I totally understand that Cockroach needs to be using INT8 or UUID for keys to make the scaling technology work. This is a problem on our end. (Sharded Postgres uses INT8 as well for keys.)

Since we cannot "re-map" types in Piccolo currently when using CockroachEngine #607 (comment) I think the best compromise is to simply throw a warning when users try to use Integer() JSON() etc. when using CockroachEngine.

Keep in mind https://github.com/cockroachdb/django-cockroachdb actually re-maps the types, so we're not alone here. See: https://github.com/cockroachdb/django-cockroachdb/blob/master/django_cockroachdb/features.py#L52

This means we're gonna need a custom MegaTable() in the test suite, removing the Integer() and JSON() fields (replacing them with BigInt() and JSONB()), because it's becoming mission impossible to pass the test suite without weird breaking workarounds in Piccolo to "accept" Integer() and JSON() when using CockroachEngine (I cant find a way that isn't super hacky).

@gnat
Copy link
Member Author

gnat commented Sep 12, 2022

It would be super easy if we had some way to re-map these column types (not just switch the column_type representation, that's not enough to effect apps/schema/test/generate.py) based on the engine we're using, but I'm assuming it would require some re-architecting of Piccolo. So it may be just best to avoid the problematic types entirely for the Cockroach tests and move forward.

@gnat
Copy link
Member Author

gnat commented Sep 12, 2022

Another strategy that may help is re-mapping types from inside apps/schema/test/generate.py when using CockroachEngine, so stuff we get back from a default configuration of Cockroach, when compared to our own generators, will match.

@gnat
Copy link
Member Author

gnat commented Sep 12, 2022

Alrighty gentlemen, the most recent build is ✔️

That said, there's a few hacks in there I'm not too proud of, and I feel like this PR is intensive enough that ideally @dantownsend could give this a refinement pass to bring it over the finish line 🏁

I think Piccolo needs a little bit of architectural evolution for this (and future distributed databases) to be added more cleanly, perhaps for 1.0. This may be a good time to lay that groundwork as we get Cockroach in.

PS: I could not add the #607 (comment) warnings because engine_type is not defined at the time we need it. 👀

@gnat
Copy link
Member Author

gnat commented Sep 18, 2022

Btw, been using the M2M support lately. Works fine except for the as_list feature (asyncpg and Cockroach generally play nice, but not with ARRAYS right now, see OP).

Even in the PR's current state, Piccolo is a very strong ORM for Cockroach.

@dantownsend
Copy link
Member

@gnat Yeah, we need to merge it in soon - we can say Piccolo has experimental / beta support for Cockroach. I just need to try it out a bit more to make sure there are no rough edges which would impact the existing Postgres support.

@gnat
Copy link
Member Author

gnat commented Sep 19, 2022

Time Travel queries (AS OF SYSTEM TIME) via mixin for SELECT is now in, with new tests. Updated the OP.

Example: Product.select().as_of('-10m') Get all Products as of 10 minutes ago.

@dantownsend would be good to get a review for that particular commit: 50aaa8b

Also wrapping up hash sharded indexes, but I'm finding that's a tougher nut to crack.

@gnat
Copy link
Member Author

gnat commented Sep 19, 2022

Added first pass on the docs: 347a487

@gnat
Copy link
Member Author

gnat commented Sep 20, 2022

The new sharded= option to enable Sharded Indexes across clusters is completed, with tests, but it should go into its own PR immediately after initial Cockroach support is merged to make the process more manageable. Divide and conquer. Updated OP.

And with that, I think we're done here aside from general review @dantownsend

@gnat
Copy link
Member Author

gnat commented Sep 21, 2022

Hey @dantownsend gonna need some help with mypy. I'm at a total loss as to what to do with these errors. Everything else is green: https://github.com/gnat/piccolo/actions/runs/3098763410/jobs/5017070875

Satisfying the linters has been a pretty frustrating experience.

image

@dantownsend
Copy link
Member

@gnat Yeah, we have a lot of linters! Don't worry too much for now. I'll see if I can figure them out while reviewing it.

Some of the MyPy errors seem to be due to duplicate test names - test_dump_load, and test_alter_column_drop_default.

@gnat
Copy link
Member Author

gnat commented Sep 21, 2022

Yeah, when wrapping tests with engine decorators, I've been trying to keep the actual test names the same so people know it's a variation of the same test.. But the linters do not like that. Example:

@engines_only("postgres", "sqlite")
def test_dump_load(self):
    pass
@engines_only("cockroach")
def test_dump_load(self):
    pass

Thoughts? @dantownsend

@dantownsend
Copy link
Member

@gnat Ah, I see.

In that situation then test_dump_load_cockroach should be OK. Otherwise, we can use # type: ignore.

@gnat
Copy link
Member Author

gnat commented Sep 21, 2022

@dantownsend All 🟢 across the board. I believe we're essentially ready to ship this. Thanks for your help with the linters today!

https://github.com/gnat/piccolo/actions/runs/3099551477/jobs/5018820456

🥳 🎉

image

@rafiss
Copy link

rafiss commented Sep 21, 2022

That's great news! We encourage you to add this tool to the list at https://www.cockroachlabs.com/docs/stable/community-tooling.html (click the Contribute button or add a PR at https://github.com/cockroachdb/docs/)

dantownsend added a commit that referenced this issue Oct 11, 2022
* Cockroach config for tests.

* Added feedback from #607

* New CockroachDB engine.

* No more extension=[] !

* Running tests as postgres.

* Initial build for CockroachDB

* Running tests.

* Run workflow.

* Naming consistency. Making roughly equivalent to Postgres.

* Timestamp bug.

* DDL for Cockroach. for_engines decorator.

* Progress.

* Added database prep for cluster settings.

* Tests passing.

* Passing migration type checks.

* Cockroach ALTER COLUMN TYPE moved outside of transaction until #49351 is solved.

* New test helpers.

* Fixtures.

* Cockroach specific stuff.

* Array tests.

* Column tests.

* JSON tests.

* M2M tables. as_list is effected by cockroachdb/cockroach#71908

* Skip test for numeric.

* Pool test.

* Transaction tests.

* Related readable tests.

* Save tests.

* To dict tests.

* JOIN tests.

* Output tests.

* Raw tests.

* Repr tests.

* Select tests.

* "As string" tests.

* Cockroach having trouble with one specific operator.

* Skipping until CRDB issue is resolved. Should work though!

* Minor updates.

* Test dump load. Other CRDB things.

* Migration manager tests.

* ALTER tests.

* SQL Generator tests.

* Serialization tests.

* Test helpers.

* Cockroach Specific stuff.

* Numerics all the same for CRDB.

* Cockroach specific stuff.

* Cleanup.

* Skipping timestamp tests.

* DDL resolution stuff for Cockroach.

* WIP.

* Cohesion and special test tables for Cockroach.

* WIP

* Special table for Cockroach.

* Removed hack.

* Should pass now.

* Removed some debug stuff.

* LGTM stuff.

* Feedback.

* Feedback.

* Feedback.

* More readable "postgres" or "cockroach.

* Added Cockroach overrides for COLUMN_TYPE_MAP, COLUMN_DEFAULT_PARSER

* Restore SmallInt functionality for Cockroach.

* Restored type() comparison for reflection / generators.

* Updated to latest Cockroach release.

* Cleanup.

* Cleanup.

* Build stuff.

* Cockroach Serial now uses unique_rowid() as default value always instead of DEFAULT.

* Removed first_id() test helper.

* Time Travel queries using AS OF SYSTEM TIME

* Added documentation.

* Small docs fix.

* Test update.

* Refactored out unnecessary code because we use unique_rowid() as default for Cockroach Serial and BigSerial.

* BIGINT instead of BIGSERIAL for Cockroach.

* LGTM stuff

* LGTM...

* Made compatible with older Python versions.

* Get Cockroach passing without Python 3.9 stuff.

* Removed unused variable.

* Fixed test for SQLite.

* Re-add Python 3.7 for CRDB + new CRDB version.

* Consolidated cockroach_skip() into engines_skip() everywhere.

* Re-add SQLite DDL.

* Moved back to original engine DDL selector.

* Reverted mistake in test suite.

* Remove migration that should not be available.

* Moving back to postgres_only for specific test. Probably cannot detect engine type at this stage.

* postgres_only() to engine_only()

* Set sane default for JSONB because we override column_type in JSON for some engines.

* Ran isort.

* Black being obnoxious, part 1.

* Black being obnoxious Part 2.

* Flake8

* Black

* Flake8

* Flake8

* Flake8

* isort

* Flake8

* Flake8

* Added alternate test names for duplicates to remove F811

* Added alternative name.

* mypy

* mypy

* mypy

* mypy

* mypy

* mypy

* mypy

* mypy

* mypy

* Cockroach: Now testing latest stable and upcoming release.

* Cockroach: Testing only v22.2

* Cockroach version documentation.

* Engine detection now based on type because CockroachEngine inherits PostgresEngine.

* postgres_only is now engines_only("postgres", "cockroach") because CockroachEngine was always tested as a child class.

* Updated to Cockroach 22.2 beta.

* isort.

* assertAlmostEqual for Decimal test. Docs update.

* Linter

* Cockroach 22.2 Beta 2.

* Added notes about special Column types for Cockroach

* version pin xpresso, so integration tests pass

* add small section about running cockroach in contributing docs

* fix typo

Co-authored-by: Daniel Townsend <dan@dantownsend.co.uk>
@dantownsend
Copy link
Member

OK, this has been merged in now. @gnat Do you want to close this issue, or keep it for tracking future TODOs?

We should do a PR to the CockroachDB docs as mentioned above.

@gnat
Copy link
Member Author

gnat commented Oct 11, 2022

Nice one. Done. I'll create/resolve new tickets as things come up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants