Skip to content

Conversation

@edgurgel
Copy link
Member

What kind of change does this PR introduce?

This change reduces the impact of slow DB setup impacting other tenants trying to connect at the same time that landed on the same partition.

I've also removed StartCounters as it's not needed anymore. We start the RateCounters where they are handled.

What is the current behavior?

Connect.init/1 can take minutes blocking other Connect processes from starting up if they share the same partition supervisor.

What is the new behavior?

Change Connect.init/1 to be quick and delay connecting to handle_continue

Additional context

Add any other context or screenshots.

@vercel
Copy link

vercel bot commented Sep 15, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
realtime-demo Ignored Ignored Preview Sep 15, 2025 2:58am

This change reduces the impact of slow DB setup impacting other tenants
trying to connect at the same time that landed on the same partition
case Database.check_tenant_connection(tenant) do
case Realtime.Database.check_tenant_connection(tenant) do
{:ok, conn} ->
Process.link(conn)
Copy link
Member Author

@edgurgel edgurgel Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to link as Database.check_tenant_connection calls Postgrex.start_link which links the process.

case Realtime.Database.check_tenant_connection(tenant) do
{:ok, conn} ->
Process.link(conn)
db_conn_reference = Process.monitor(conn)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact this Process.monitor is also not doing much given that they are linked. Most likely won't have time to react to the DOWN message as the linked process will crash Connect

Copy link
Member

@filipecabaco filipecabaco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean that , for some amount of time, users will be able to start connecting and then we will disconnect them?

part of the reason we had this to be more blockng / sync was to actually be sure that the database was fully ready and up to move forward with accepting socket connections so I'm wondering what will be the side effects of having this more async vs just having more partitions

@edgurgel
Copy link
Member Author

edgurgel commented Sep 15, 2025

does this mean that , for some amount of time, users will be able to start connecting and then we will disconnect them?
part of the reason we had this to be more blockng / sync was to actually be sure that the database was fully ready and up to move forward with accepting socket connections so I'm wondering what will be the side effects of having this more async vs just having more partitions

We only return a conn when syn has the conn metadata which is after the DB connection is set-up. This is how it works today already and this PR has not changed this.

This PR just also changes so that we also capture the shutdown while connecting. We won't block on the Supervisor call anymore but we will block waiting for the syn message that the conn is ready (which we already do today).

Previously if we had 5 websockets from different nodes calling Connect.lookup_or..:

  • The first one would block calling Supervisor.start_child
  • The other 4 websockets would block waiting for the syn message saying that conn metadata has been set.

This PR changes so that all 5 websockets wait on syn

`

@filipecabaco
Copy link
Member

makes sense ship it 👍

@edgurgel edgurgel merged commit 70339c7 into main Sep 15, 2025
5 checks passed
@edgurgel edgurgel deleted the fix/connect-init branch September 15, 2025 21:42
@kiwicopple
Copy link
Member

🎉 This PR is included in version 2.48.1 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Fudster added a commit to KBVE/realtime that referenced this pull request Sep 23, 2025
* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

* feat: upgrade cowboy & ranch (supabase#1523)

* fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525)

* fix: enable presence on track message (supabase#1527)

currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them

* fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530)

cowboy 2.13.0 set the default active_n=1

* fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531)

* feat: disable UTF8 validation on websocket frames (supabase#1532)

Currently all text frames as handled only with JSON which already requires UTF-8

* fix: move DB setup to happen after Connect.init (supabase#1533)

This change reduces the impact of slow DB setup impacting other tenants
trying to connect at the same time that landed on the same partition

* fix: handle wal bloat (supabase#1528)

Verify that replication connection is able to reconnect when faced with WAL bloat issues

* feat: replay realtime.messages (supabase#1526)

A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast"

The hardcoded limit is 25 for now.

* feat: gen_rpc pub sub adapter (supabase#1529)

Add a PubSub adapter that uses gen_rpc to send messages to other nodes.

It uses :gen_rpc.abcast/3 instead of :erlang.send/2

The adapter works very similarly to the PG2 adapter. It consists of
multiple workers that forward to the local node using PubSub.local_broadcast.

The way to choose the worker to be used is based on the sending process
just like PG2 adapter does

The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`.
This distinction exists because Phoenix.PubSub uses `:pool_size` to
define how many partitions the PubSub registry will use. It's possible
to control them separately by using `:broadcast_pool_size`

* fix: ensure message id doesn't raise on non-map payloads (supabase#1534)

* fix: match error on Connect (supabase#1536)



---------

Co-authored-by: Eduardo Gurgel Pinho <eduardo.gurgel@supabase.io>

* feat: websocket max heap size configuration (supabase#1538)

* fix: set max process heap size to 500MB instead of 8GB
* feat: set websocket transport max heap size

WEBSOCKET_MAX_HEAP_SIZE can be used to configure it

* fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537)

Issues:

* Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time
* Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node.

* fix: improve ErlSysMon logging for processes (supabase#1540)

Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size

Also bump long_schedule and long_gc

* fix: make pubsub adapter configurable (supabase#1539)

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>
Fudster added a commit to KBVE/realtime that referenced this pull request Sep 23, 2025
* 🔄 Sync with upstream changes (#2)

* chore: fix couple of flaky tests (supabase#1517)

* fix: Improve runtime setup logic (supabase#1511)

Cleanup runtime.exs logic to be more organized and easier to mantain

* fix: runtime setup error (supabase#1520)

---------

Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Filipe Cabaço <filipe@supabase.io>

* 🔄 Sync with upstream changes (#4)

* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>

* 🔄 Sync with upstream changes (#6)

* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>

* 🔄 Sync with upstream changes (#7)

* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

* feat: upgrade cowboy & ranch (supabase#1523)

* fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525)

* fix: enable presence on track message (supabase#1527)

currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them

* fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530)

cowboy 2.13.0 set the default active_n=1

* fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531)

* feat: disable UTF8 validation on websocket frames (supabase#1532)

Currently all text frames as handled only with JSON which already requires UTF-8

* fix: move DB setup to happen after Connect.init (supabase#1533)

This change reduces the impact of slow DB setup impacting other tenants
trying to connect at the same time that landed on the same partition

* fix: handle wal bloat (supabase#1528)

Verify that replication connection is able to reconnect when faced with WAL bloat issues

* feat: replay realtime.messages (supabase#1526)

A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast"

The hardcoded limit is 25 for now.

* feat: gen_rpc pub sub adapter (supabase#1529)

Add a PubSub adapter that uses gen_rpc to send messages to other nodes.

It uses :gen_rpc.abcast/3 instead of :erlang.send/2

The adapter works very similarly to the PG2 adapter. It consists of
multiple workers that forward to the local node using PubSub.local_broadcast.

The way to choose the worker to be used is based on the sending process
just like PG2 adapter does

The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`.
This distinction exists because Phoenix.PubSub uses `:pool_size` to
define how many partitions the PubSub registry will use. It's possible
to control them separately by using `:broadcast_pool_size`

* fix: ensure message id doesn't raise on non-map payloads (supabase#1534)

* fix: match error on Connect (supabase#1536)



---------

Co-authored-by: Eduardo Gurgel Pinho <eduardo.gurgel@supabase.io>

* feat: websocket max heap size configuration (supabase#1538)

* fix: set max process heap size to 500MB instead of 8GB
* feat: set websocket transport max heap size

WEBSOCKET_MAX_HEAP_SIZE can be used to configure it

* fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537)

Issues:

* Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time
* Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node.

* fix: improve ErlSysMon logging for processes (supabase#1540)

Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size

Also bump long_schedule and long_gc

* fix: make pubsub adapter configurable (supabase#1539)

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>

---------

Co-authored-by: Al @h0lybyte <5599058+h0lybyte@users.noreply.github.com>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>
Fudster added a commit to KBVE/realtime that referenced this pull request Sep 25, 2025
* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

* feat: upgrade cowboy & ranch (supabase#1523)

* fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525)

* fix: enable presence on track message (supabase#1527)

currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them

* fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530)

cowboy 2.13.0 set the default active_n=1

* fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531)

* feat: disable UTF8 validation on websocket frames (supabase#1532)

Currently all text frames as handled only with JSON which already requires UTF-8

* fix: move DB setup to happen after Connect.init (supabase#1533)

This change reduces the impact of slow DB setup impacting other tenants
trying to connect at the same time that landed on the same partition

* fix: handle wal bloat (supabase#1528)

Verify that replication connection is able to reconnect when faced with WAL bloat issues

* feat: replay realtime.messages (supabase#1526)

A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast"

The hardcoded limit is 25 for now.

* feat: gen_rpc pub sub adapter (supabase#1529)

Add a PubSub adapter that uses gen_rpc to send messages to other nodes.

It uses :gen_rpc.abcast/3 instead of :erlang.send/2

The adapter works very similarly to the PG2 adapter. It consists of
multiple workers that forward to the local node using PubSub.local_broadcast.

The way to choose the worker to be used is based on the sending process
just like PG2 adapter does

The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`.
This distinction exists because Phoenix.PubSub uses `:pool_size` to
define how many partitions the PubSub registry will use. It's possible
to control them separately by using `:broadcast_pool_size`

* fix: ensure message id doesn't raise on non-map payloads (supabase#1534)

* fix: match error on Connect (supabase#1536)



---------

Co-authored-by: Eduardo Gurgel Pinho <eduardo.gurgel@supabase.io>

* feat: websocket max heap size configuration (supabase#1538)

* fix: set max process heap size to 500MB instead of 8GB
* feat: set websocket transport max heap size

WEBSOCKET_MAX_HEAP_SIZE can be used to configure it

* fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537)

Issues:

* Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time
* Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node.

* fix: improve ErlSysMon logging for processes (supabase#1540)

Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size

Also bump long_schedule and long_gc

* fix: make pubsub adapter configurable (supabase#1539)

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>
Fudster added a commit to KBVE/realtime that referenced this pull request Sep 25, 2025
* 🔄 Sync with upstream changes (#2)

* chore: fix couple of flaky tests (supabase#1517)

* fix: Improve runtime setup logic (supabase#1511)

Cleanup runtime.exs logic to be more organized and easier to mantain

* fix: runtime setup error (supabase#1520)

---------

Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Filipe Cabaço <filipe@supabase.io>

* 🔄 Sync with upstream changes (#4)

* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>

* 🔄 Sync with upstream changes (#6)

* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>

* 🔄 Sync with upstream changes (#7)

* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

* feat: upgrade cowboy & ranch (supabase#1523)

* fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525)

* fix: enable presence on track message (supabase#1527)

currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them

* fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530)

cowboy 2.13.0 set the default active_n=1

* fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531)

* feat: disable UTF8 validation on websocket frames (supabase#1532)

Currently all text frames as handled only with JSON which already requires UTF-8

* fix: move DB setup to happen after Connect.init (supabase#1533)

This change reduces the impact of slow DB setup impacting other tenants
trying to connect at the same time that landed on the same partition

* fix: handle wal bloat (supabase#1528)

Verify that replication connection is able to reconnect when faced with WAL bloat issues

* feat: replay realtime.messages (supabase#1526)

A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast"

The hardcoded limit is 25 for now.

* feat: gen_rpc pub sub adapter (supabase#1529)

Add a PubSub adapter that uses gen_rpc to send messages to other nodes.

It uses :gen_rpc.abcast/3 instead of :erlang.send/2

The adapter works very similarly to the PG2 adapter. It consists of
multiple workers that forward to the local node using PubSub.local_broadcast.

The way to choose the worker to be used is based on the sending process
just like PG2 adapter does

The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`.
This distinction exists because Phoenix.PubSub uses `:pool_size` to
define how many partitions the PubSub registry will use. It's possible
to control them separately by using `:broadcast_pool_size`

* fix: ensure message id doesn't raise on non-map payloads (supabase#1534)

* fix: match error on Connect (supabase#1536)



---------

Co-authored-by: Eduardo Gurgel Pinho <eduardo.gurgel@supabase.io>

* feat: websocket max heap size configuration (supabase#1538)

* fix: set max process heap size to 500MB instead of 8GB
* feat: set websocket transport max heap size

WEBSOCKET_MAX_HEAP_SIZE can be used to configure it

* fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537)

Issues:

* Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time
* Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node.

* fix: improve ErlSysMon logging for processes (supabase#1540)

Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size

Also bump long_schedule and long_gc

* fix: make pubsub adapter configurable (supabase#1539)

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>

* 🔄 Sync with upstream changes (#9)

* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

* feat: upgrade cowboy & ranch (supabase#1523)

* fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525)

* fix: enable presence on track message (supabase#1527)

currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them

* fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530)

cowboy 2.13.0 set the default active_n=1

* fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531)

* feat: disable UTF8 validation on websocket frames (supabase#1532)

Currently all text frames as handled only with JSON which already requires UTF-8

* fix: move DB setup to happen after Connect.init (supabase#1533)

This change reduces the impact of slow DB setup impacting other tenants
trying to connect at the same time that landed on the same partition

* fix: handle wal bloat (supabase#1528)

Verify that replication connection is able to reconnect when faced with WAL bloat issues

* feat: replay realtime.messages (supabase#1526)

A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast"

The hardcoded limit is 25 for now.

* feat: gen_rpc pub sub adapter (supabase#1529)

Add a PubSub adapter that uses gen_rpc to send messages to other nodes.

It uses :gen_rpc.abcast/3 instead of :erlang.send/2

The adapter works very similarly to the PG2 adapter. It consists of
multiple workers that forward to the local node using PubSub.local_broadcast.

The way to choose the worker to be used is based on the sending process
just like PG2 adapter does

The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`.
This distinction exists because Phoenix.PubSub uses `:pool_size` to
define how many partitions the PubSub registry will use. It's possible
to control them separately by using `:broadcast_pool_size`

* fix: ensure message id doesn't raise on non-map payloads (supabase#1534)

* fix: match error on Connect (supabase#1536)



---------

Co-authored-by: Eduardo Gurgel Pinho <eduardo.gurgel@supabase.io>

* feat: websocket max heap size configuration (supabase#1538)

* fix: set max process heap size to 500MB instead of 8GB
* feat: set websocket transport max heap size

WEBSOCKET_MAX_HEAP_SIZE can be used to configure it

* fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537)

Issues:

* Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time
* Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node.

* fix: improve ErlSysMon logging for processes (supabase#1540)

Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size

Also bump long_schedule and long_gc

* fix: make pubsub adapter configurable (supabase#1539)

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>

---------

Co-authored-by: Al @h0lybyte <5599058+h0lybyte@users.noreply.github.com>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>
Fudster added a commit to KBVE/realtime that referenced this pull request Oct 4, 2025
* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

* feat: upgrade cowboy & ranch (supabase#1523)

* fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525)

* fix: enable presence on track message (supabase#1527)

currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them

* fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530)

cowboy 2.13.0 set the default active_n=1

* fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531)

* feat: disable UTF8 validation on websocket frames (supabase#1532)

Currently all text frames as handled only with JSON which already requires UTF-8

* fix: move DB setup to happen after Connect.init (supabase#1533)

This change reduces the impact of slow DB setup impacting other tenants
trying to connect at the same time that landed on the same partition

* fix: handle wal bloat (supabase#1528)

Verify that replication connection is able to reconnect when faced with WAL bloat issues

* feat: replay realtime.messages (supabase#1526)

A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast"

The hardcoded limit is 25 for now.

* feat: gen_rpc pub sub adapter (supabase#1529)

Add a PubSub adapter that uses gen_rpc to send messages to other nodes.

It uses :gen_rpc.abcast/3 instead of :erlang.send/2

The adapter works very similarly to the PG2 adapter. It consists of
multiple workers that forward to the local node using PubSub.local_broadcast.

The way to choose the worker to be used is based on the sending process
just like PG2 adapter does

The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`.
This distinction exists because Phoenix.PubSub uses `:pool_size` to
define how many partitions the PubSub registry will use. It's possible
to control them separately by using `:broadcast_pool_size`

* fix: ensure message id doesn't raise on non-map payloads (supabase#1534)

* fix: match error on Connect (supabase#1536)



---------

Co-authored-by: Eduardo Gurgel Pinho <eduardo.gurgel@supabase.io>

* feat: websocket max heap size configuration (supabase#1538)

* fix: set max process heap size to 500MB instead of 8GB
* feat: set websocket transport max heap size

WEBSOCKET_MAX_HEAP_SIZE can be used to configure it

* fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537)

Issues:

* Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time
* Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node.

* fix: improve ErlSysMon logging for processes (supabase#1540)

Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size

Also bump long_schedule and long_gc

* fix: make pubsub adapter configurable (supabase#1539)

* fix: specify that only private channels are allowed when replaying (supabase#1543)

messages

* fix: rate limit connect module (supabase#1541)

On bad connection, we rate limit the Connect module so we prevent abuses and too much logging of errors

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>
Fudster added a commit to KBVE/realtime that referenced this pull request Oct 4, 2025
* 🔄 Sync with upstream changes (#2)

* chore: fix couple of flaky tests (supabase#1517)

* fix: Improve runtime setup logic (supabase#1511)

Cleanup runtime.exs logic to be more organized and easier to mantain

* fix: runtime setup error (supabase#1520)

---------

Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Filipe Cabaço <filipe@supabase.io>

* 🔄 Sync with upstream changes (#4)

* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>

* 🔄 Sync with upstream changes (#6)

* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>

* 🔄 Sync with upstream changes (#7)

* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

* feat: upgrade cowboy & ranch (supabase#1523)

* fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525)

* fix: enable presence on track message (supabase#1527)

currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them

* fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530)

cowboy 2.13.0 set the default active_n=1

* fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531)

* feat: disable UTF8 validation on websocket frames (supabase#1532)

Currently all text frames as handled only with JSON which already requires UTF-8

* fix: move DB setup to happen after Connect.init (supabase#1533)

This change reduces the impact of slow DB setup impacting other tenants
trying to connect at the same time that landed on the same partition

* fix: handle wal bloat (supabase#1528)

Verify that replication connection is able to reconnect when faced with WAL bloat issues

* feat: replay realtime.messages (supabase#1526)

A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast"

The hardcoded limit is 25 for now.

* feat: gen_rpc pub sub adapter (supabase#1529)

Add a PubSub adapter that uses gen_rpc to send messages to other nodes.

It uses :gen_rpc.abcast/3 instead of :erlang.send/2

The adapter works very similarly to the PG2 adapter. It consists of
multiple workers that forward to the local node using PubSub.local_broadcast.

The way to choose the worker to be used is based on the sending process
just like PG2 adapter does

The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`.
This distinction exists because Phoenix.PubSub uses `:pool_size` to
define how many partitions the PubSub registry will use. It's possible
to control them separately by using `:broadcast_pool_size`

* fix: ensure message id doesn't raise on non-map payloads (supabase#1534)

* fix: match error on Connect (supabase#1536)



---------

Co-authored-by: Eduardo Gurgel Pinho <eduardo.gurgel@supabase.io>

* feat: websocket max heap size configuration (supabase#1538)

* fix: set max process heap size to 500MB instead of 8GB
* feat: set websocket transport max heap size

WEBSOCKET_MAX_HEAP_SIZE can be used to configure it

* fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537)

Issues:

* Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time
* Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node.

* fix: improve ErlSysMon logging for processes (supabase#1540)

Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size

Also bump long_schedule and long_gc

* fix: make pubsub adapter configurable (supabase#1539)

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>

* 🔄 Sync with upstream changes (#9)

* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

* feat: upgrade cowboy & ranch (supabase#1523)

* fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525)

* fix: enable presence on track message (supabase#1527)

currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them

* fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530)

cowboy 2.13.0 set the default active_n=1

* fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531)

* feat: disable UTF8 validation on websocket frames (supabase#1532)

Currently all text frames as handled only with JSON which already requires UTF-8

* fix: move DB setup to happen after Connect.init (supabase#1533)

This change reduces the impact of slow DB setup impacting other tenants
trying to connect at the same time that landed on the same partition

* fix: handle wal bloat (supabase#1528)

Verify that replication connection is able to reconnect when faced with WAL bloat issues

* feat: replay realtime.messages (supabase#1526)

A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast"

The hardcoded limit is 25 for now.

* feat: gen_rpc pub sub adapter (supabase#1529)

Add a PubSub adapter that uses gen_rpc to send messages to other nodes.

It uses :gen_rpc.abcast/3 instead of :erlang.send/2

The adapter works very similarly to the PG2 adapter. It consists of
multiple workers that forward to the local node using PubSub.local_broadcast.

The way to choose the worker to be used is based on the sending process
just like PG2 adapter does

The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`.
This distinction exists because Phoenix.PubSub uses `:pool_size` to
define how many partitions the PubSub registry will use. It's possible
to control them separately by using `:broadcast_pool_size`

* fix: ensure message id doesn't raise on non-map payloads (supabase#1534)

* fix: match error on Connect (supabase#1536)



---------

Co-authored-by: Eduardo Gurgel Pinho <eduardo.gurgel@supabase.io>

* feat: websocket max heap size configuration (supabase#1538)

* fix: set max process heap size to 500MB instead of 8GB
* feat: set websocket transport max heap size

WEBSOCKET_MAX_HEAP_SIZE can be used to configure it

* fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537)

Issues:

* Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time
* Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node.

* fix: improve ErlSysMon logging for processes (supabase#1540)

Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size

Also bump long_schedule and long_gc

* fix: make pubsub adapter configurable (supabase#1539)

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>

* 🔄 Sync with upstream changes (#11)

* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

* feat: upgrade cowboy & ranch (supabase#1523)

* fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525)

* fix: enable presence on track message (supabase#1527)

currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them

* fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530)

cowboy 2.13.0 set the default active_n=1

* fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531)

* feat: disable UTF8 validation on websocket frames (supabase#1532)

Currently all text frames as handled only with JSON which already requires UTF-8

* fix: move DB setup to happen after Connect.init (supabase#1533)

This change reduces the impact of slow DB setup impacting other tenants
trying to connect at the same time that landed on the same partition

* fix: handle wal bloat (supabase#1528)

Verify that replication connection is able to reconnect when faced with WAL bloat issues

* feat: replay realtime.messages (supabase#1526)

A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast"

The hardcoded limit is 25 for now.

* feat: gen_rpc pub sub adapter (supabase#1529)

Add a PubSub adapter that uses gen_rpc to send messages to other nodes.

It uses :gen_rpc.abcast/3 instead of :erlang.send/2

The adapter works very similarly to the PG2 adapter. It consists of
multiple workers that forward to the local node using PubSub.local_broadcast.

The way to choose the worker to be used is based on the sending process
just like PG2 adapter does

The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`.
This distinction exists because Phoenix.PubSub uses `:pool_size` to
define how many partitions the PubSub registry will use. It's possible
to control them separately by using `:broadcast_pool_size`

* fix: ensure message id doesn't raise on non-map payloads (supabase#1534)

* fix: match error on Connect (supabase#1536)



---------

Co-authored-by: Eduardo Gurgel Pinho <eduardo.gurgel@supabase.io>

* feat: websocket max heap size configuration (supabase#1538)

* fix: set max process heap size to 500MB instead of 8GB
* feat: set websocket transport max heap size

WEBSOCKET_MAX_HEAP_SIZE can be used to configure it

* fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537)

Issues:

* Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time
* Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node.

* fix: improve ErlSysMon logging for processes (supabase#1540)

Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size

Also bump long_schedule and long_gc

* fix: make pubsub adapter configurable (supabase#1539)

* fix: specify that only private channels are allowed when replaying (supabase#1543)

messages

* fix: rate limit connect module (supabase#1541)

On bad connection, we rate limit the Connect module so we prevent abuses and too much logging of errors

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>

---------

Co-authored-by: Al @h0lybyte <5599058+h0lybyte@users.noreply.github.com>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>
Fudster added a commit to KBVE/realtime that referenced this pull request Oct 7, 2025
* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

* feat: upgrade cowboy & ranch (supabase#1523)

* fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525)

* fix: enable presence on track message (supabase#1527)

currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them

* fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530)

cowboy 2.13.0 set the default active_n=1

* fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531)

* feat: disable UTF8 validation on websocket frames (supabase#1532)

Currently all text frames as handled only with JSON which already requires UTF-8

* fix: move DB setup to happen after Connect.init (supabase#1533)

This change reduces the impact of slow DB setup impacting other tenants
trying to connect at the same time that landed on the same partition

* fix: handle wal bloat (supabase#1528)

Verify that replication connection is able to reconnect when faced with WAL bloat issues

* feat: replay realtime.messages (supabase#1526)

A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast"

The hardcoded limit is 25 for now.

* feat: gen_rpc pub sub adapter (supabase#1529)

Add a PubSub adapter that uses gen_rpc to send messages to other nodes.

It uses :gen_rpc.abcast/3 instead of :erlang.send/2

The adapter works very similarly to the PG2 adapter. It consists of
multiple workers that forward to the local node using PubSub.local_broadcast.

The way to choose the worker to be used is based on the sending process
just like PG2 adapter does

The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`.
This distinction exists because Phoenix.PubSub uses `:pool_size` to
define how many partitions the PubSub registry will use. It's possible
to control them separately by using `:broadcast_pool_size`

* fix: ensure message id doesn't raise on non-map payloads (supabase#1534)

* fix: match error on Connect (supabase#1536)



---------

Co-authored-by: Eduardo Gurgel Pinho <eduardo.gurgel@supabase.io>

* feat: websocket max heap size configuration (supabase#1538)

* fix: set max process heap size to 500MB instead of 8GB
* feat: set websocket transport max heap size

WEBSOCKET_MAX_HEAP_SIZE can be used to configure it

* fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537)

Issues:

* Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time
* Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node.

* fix: improve ErlSysMon logging for processes (supabase#1540)

Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size

Also bump long_schedule and long_gc

* fix: make pubsub adapter configurable (supabase#1539)

* fix: specify that only private channels are allowed when replaying (supabase#1543)

messages

* fix: rate limit connect module (supabase#1541)

On bad connection, we rate limit the Connect module so we prevent abuses and too much logging of errors

* build: automatically cancel old tests/build on new push (supabase#1545)

Currently, whenever you push any commit to your branch, the old builds are still running and a new build is started. Once a new commit is added, the old test results no longer matter and it's just a waste of CI resources. Also reduces confusion with multiple builds running in parallel for the same branch/possibly blocking any merges.

With this little change, we ensure that whenever a new commit is added, the previous build is immediately canceled/stopped and only the build (latest commit) runs.

* fix: move message queue data to off-heap for gen_rpc pub sub workers (supabase#1548)

* fix: rate limit Connect.lookup_or_start_connection on error only (supabase#1549)

* fix: increase connect error rate window to 30 seconds (supabase#1550)

* fix: set a lower fullsweep_after flag for GenRpcPubSub workers (supabase#1551)

* fix: hardcode presence limit (supabase#1552)

* fix: further decrease limit on presence events (supabase#1553)

* fix: bump up realtime (supabase#1554)

* fix: lower rate limit to 100 events per second (supabase#1556)

* fix: move connect rate limit to socket (supabase#1555)

* fix: reduce max_frame_size to 5MB
* fix: fullsweep_after=100 on gen rpc pub sub workers

---------

Co-authored-by: Eduardo Gurgel Pinho <eduardo.gurgel@supabase.io>

* fix: collect global metrics without tenant tagging (supabase#1557)

* feat: presence payload size (supabase#1559)

* Also tweak buckets to account all the way to 3000KB
* Start tagging the payload size metrics with message_type. message_type can be presence, broadcast or postgres_changes

* fix: use GenRpc for Realtime.Latency pings (supabase#1560)

* Fastlane for phoenix presence_diff (supabase#1558)

It uses a fork of Phoenix for time being

* fix: count presence_diff events on MessageDispatcher
* fix: remove traces from console during development

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Kevin Grüneberg <k.grueneberg1994@gmail.com>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>
Fudster added a commit to KBVE/realtime that referenced this pull request Oct 7, 2025
* 🔄 Sync with upstream changes (#2)

* chore: fix couple of flaky tests (supabase#1517)

* fix: Improve runtime setup logic (supabase#1511)

Cleanup runtime.exs logic to be more organized and easier to mantain

* fix: runtime setup error (supabase#1520)

---------

Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Filipe Cabaço <filipe@supabase.io>

* 🔄 Sync with upstream changes (#4)

* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>

* 🔄 Sync with upstream changes (#6)

* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>

* 🔄 Sync with upstream changes (#7)

* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

* feat: upgrade cowboy & ranch (supabase#1523)

* fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525)

* fix: enable presence on track message (supabase#1527)

currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them

* fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530)

cowboy 2.13.0 set the default active_n=1

* fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531)

* feat: disable UTF8 validation on websocket frames (supabase#1532)

Currently all text frames as handled only with JSON which already requires UTF-8

* fix: move DB setup to happen after Connect.init (supabase#1533)

This change reduces the impact of slow DB setup impacting other tenants
trying to connect at the same time that landed on the same partition

* fix: handle wal bloat (supabase#1528)

Verify that replication connection is able to reconnect when faced with WAL bloat issues

* feat: replay realtime.messages (supabase#1526)

A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast"

The hardcoded limit is 25 for now.

* feat: gen_rpc pub sub adapter (supabase#1529)

Add a PubSub adapter that uses gen_rpc to send messages to other nodes.

It uses :gen_rpc.abcast/3 instead of :erlang.send/2

The adapter works very similarly to the PG2 adapter. It consists of
multiple workers that forward to the local node using PubSub.local_broadcast.

The way to choose the worker to be used is based on the sending process
just like PG2 adapter does

The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`.
This distinction exists because Phoenix.PubSub uses `:pool_size` to
define how many partitions the PubSub registry will use. It's possible
to control them separately by using `:broadcast_pool_size`

* fix: ensure message id doesn't raise on non-map payloads (supabase#1534)

* fix: match error on Connect (supabase#1536)



---------

Co-authored-by: Eduardo Gurgel Pinho <eduardo.gurgel@supabase.io>

* feat: websocket max heap size configuration (supabase#1538)

* fix: set max process heap size to 500MB instead of 8GB
* feat: set websocket transport max heap size

WEBSOCKET_MAX_HEAP_SIZE can be used to configure it

* fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537)

Issues:

* Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time
* Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node.

* fix: improve ErlSysMon logging for processes (supabase#1540)

Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size

Also bump long_schedule and long_gc

* fix: make pubsub adapter configurable (supabase#1539)

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>

* 🔄 Sync with upstream changes (#9)

* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

* feat: upgrade cowboy & ranch (supabase#1523)

* fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525)

* fix: enable presence on track message (supabase#1527)

currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them

* fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530)

cowboy 2.13.0 set the default active_n=1

* fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531)

* feat: disable UTF8 validation on websocket frames (supabase#1532)

Currently all text frames as handled only with JSON which already requires UTF-8

* fix: move DB setup to happen after Connect.init (supabase#1533)

This change reduces the impact of slow DB setup impacting other tenants
trying to connect at the same time that landed on the same partition

* fix: handle wal bloat (supabase#1528)

Verify that replication connection is able to reconnect when faced with WAL bloat issues

* feat: replay realtime.messages (supabase#1526)

A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast"

The hardcoded limit is 25 for now.

* feat: gen_rpc pub sub adapter (supabase#1529)

Add a PubSub adapter that uses gen_rpc to send messages to other nodes.

It uses :gen_rpc.abcast/3 instead of :erlang.send/2

The adapter works very similarly to the PG2 adapter. It consists of
multiple workers that forward to the local node using PubSub.local_broadcast.

The way to choose the worker to be used is based on the sending process
just like PG2 adapter does

The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`.
This distinction exists because Phoenix.PubSub uses `:pool_size` to
define how many partitions the PubSub registry will use. It's possible
to control them separately by using `:broadcast_pool_size`

* fix: ensure message id doesn't raise on non-map payloads (supabase#1534)

* fix: match error on Connect (supabase#1536)



---------

Co-authored-by: Eduardo Gurgel Pinho <eduardo.gurgel@supabase.io>

* feat: websocket max heap size configuration (supabase#1538)

* fix: set max process heap size to 500MB instead of 8GB
* feat: set websocket transport max heap size

WEBSOCKET_MAX_HEAP_SIZE can be used to configure it

* fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537)

Issues:

* Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time
* Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node.

* fix: improve ErlSysMon logging for processes (supabase#1540)

Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size

Also bump long_schedule and long_gc

* fix: make pubsub adapter configurable (supabase#1539)

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>

* 🔄 Sync with upstream changes (#11)

* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

* feat: upgrade cowboy & ranch (supabase#1523)

* fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525)

* fix: enable presence on track message (supabase#1527)

currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them

* fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530)

cowboy 2.13.0 set the default active_n=1

* fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531)

* feat: disable UTF8 validation on websocket frames (supabase#1532)

Currently all text frames as handled only with JSON which already requires UTF-8

* fix: move DB setup to happen after Connect.init (supabase#1533)

This change reduces the impact of slow DB setup impacting other tenants
trying to connect at the same time that landed on the same partition

* fix: handle wal bloat (supabase#1528)

Verify that replication connection is able to reconnect when faced with WAL bloat issues

* feat: replay realtime.messages (supabase#1526)

A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast"

The hardcoded limit is 25 for now.

* feat: gen_rpc pub sub adapter (supabase#1529)

Add a PubSub adapter that uses gen_rpc to send messages to other nodes.

It uses :gen_rpc.abcast/3 instead of :erlang.send/2

The adapter works very similarly to the PG2 adapter. It consists of
multiple workers that forward to the local node using PubSub.local_broadcast.

The way to choose the worker to be used is based on the sending process
just like PG2 adapter does

The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`.
This distinction exists because Phoenix.PubSub uses `:pool_size` to
define how many partitions the PubSub registry will use. It's possible
to control them separately by using `:broadcast_pool_size`

* fix: ensure message id doesn't raise on non-map payloads (supabase#1534)

* fix: match error on Connect (supabase#1536)



---------

Co-authored-by: Eduardo Gurgel Pinho <eduardo.gurgel@supabase.io>

* feat: websocket max heap size configuration (supabase#1538)

* fix: set max process heap size to 500MB instead of 8GB
* feat: set websocket transport max heap size

WEBSOCKET_MAX_HEAP_SIZE can be used to configure it

* fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537)

Issues:

* Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time
* Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node.

* fix: improve ErlSysMon logging for processes (supabase#1540)

Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size

Also bump long_schedule and long_gc

* fix: make pubsub adapter configurable (supabase#1539)

* fix: specify that only private channels are allowed when replaying (supabase#1543)

messages

* fix: rate limit connect module (supabase#1541)

On bad connection, we rate limit the Connect module so we prevent abuses and too much logging of errors

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>

* 🔄 Sync with upstream changes (#13)

* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

* feat: upgrade cowboy & ranch (supabase#1523)

* fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525)

* fix: enable presence on track message (supabase#1527)

currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them

* fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530)

cowboy 2.13.0 set the default active_n=1

* fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531)

* feat: disable UTF8 validation on websocket frames (supabase#1532)

Currently all text frames as handled only with JSON which already requires UTF-8

* fix: move DB setup to happen after Connect.init (supabase#1533)

This change reduces the impact of slow DB setup impacting other tenants
trying to connect at the same time that landed on the same partition

* fix: handle wal bloat (supabase#1528)

Verify that replication connection is able to reconnect when faced with WAL bloat issues

* feat: replay realtime.messages (supabase#1526)

A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast"

The hardcoded limit is 25 for now.

* feat: gen_rpc pub sub adapter (supabase#1529)

Add a PubSub adapter that uses gen_rpc to send messages to other nodes.

It uses :gen_rpc.abcast/3 instead of :erlang.send/2

The adapter works very similarly to the PG2 adapter. It consists of
multiple workers that forward to the local node using PubSub.local_broadcast.

The way to choose the worker to be used is based on the sending process
just like PG2 adapter does

The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`.
This distinction exists because Phoenix.PubSub uses `:pool_size` to
define how many partitions the PubSub registry will use. It's possible
to control them separately by using `:broadcast_pool_size`

* fix: ensure message id doesn't raise on non-map payloads (supabase#1534)

* fix: match error on Connect (supabase#1536)



---------

Co-authored-by: Eduardo Gurgel Pinho <eduardo.gurgel@supabase.io>

* feat: websocket max heap size configuration (supabase#1538)

* fix: set max process heap size to 500MB instead of 8GB
* feat: set websocket transport max heap size

WEBSOCKET_MAX_HEAP_SIZE can be used to configure it

* fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537)

Issues:

* Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time
* Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node.

* fix: improve ErlSysMon logging for processes (supabase#1540)

Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size

Also bump long_schedule and long_gc

* fix: make pubsub adapter configurable (supabase#1539)

* fix: specify that only private channels are allowed when replaying (supabase#1543)

messages

* fix: rate limit connect module (supabase#1541)

On bad connection, we rate limit the Connect module so we prevent abuses and too much logging of errors

* build: automatically cancel old tests/build on new push (supabase#1545)

Currently, whenever you push any commit to your branch, the old builds are still running and a new build is started. Once a new commit is added, the old test results no longer matter and it's just a waste of CI resources. Also reduces confusion with multiple builds running in parallel for the same branch/possibly blocking any merges.

With this little change, we ensure that whenever a new commit is added, the previous build is immediately canceled/stopped and only the build (latest commit) runs.

* fix: move message queue data to off-heap for gen_rpc pub sub workers (supabase#1548)

* fix: rate limit Connect.lookup_or_start_connection on error only (supabase#1549)

* fix: increase connect error rate window to 30 seconds (supabase#1550)

* fix: set a lower fullsweep_after flag for GenRpcPubSub workers (supabase#1551)

* fix: hardcode presence limit (supabase#1552)

* fix: further decrease limit on presence events (supabase#1553)

* fix: bump up realtime (supabase#1554)

* fix: lower rate limit to 100 events per second (supabase#1556)

* fix: move connect rate limit to socket (supabase#1555)

* fix: reduce max_frame_size to 5MB
* fix: fullsweep_after=100 on gen rpc pub sub workers

---------

Co-authored-by: Eduardo Gurgel Pinho <eduardo.gurgel@supabase.io>

* fix: collect global metrics without tenant tagging (supabase#1557)

* feat: presence payload size (supabase#1559)

* Also tweak buckets to account all the way to 3000KB
* Start tagging the payload size metrics with message_type. message_type can be presence, broadcast or postgres_changes

* fix: use GenRpc for Realtime.Latency pings (supabase#1560)

* Fastlane for phoenix presence_diff (supabase#1558)

It uses a fork of Phoenix for time being

* fix: count presence_diff events on MessageDispatcher
* fix: remove traces from console during development

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Kevin Grüneberg <k.grueneberg1994@gmail.com>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>

---------

Co-authored-by: Al @h0lybyte <5599058+h0lybyte@users.noreply.github.com>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>
Co-authored-by: Kevin Grüneberg <k.grueneberg1994@gmail.com>
Fudster added a commit to KBVE/realtime that referenced this pull request Oct 12, 2025
* fix: runtime setup error (supabase#1520)

* fix: use primary instead of replica on rename_settings_field (supabase#1521)

* feat: upgrade cowboy & ranch (supabase#1523)

* fix: Fix GenRpc to not try to connect to nodes that are not alive (supabase#1525)

* fix: enable presence on track message (supabase#1527)

currently the user would need to have enabled from the beginning of the channel. this will enable users to enable presence later in the flow by sending a track message which will enable presence messages for them

* fix: set cowboy active_n=100 as cowboy 2.12.0 (supabase#1530)

cowboy 2.13.0 set the default active_n=1

* fix: provide error_code metadata on RealtimeChannel.Logging (supabase#1531)

* feat: disable UTF8 validation on websocket frames (supabase#1532)

Currently all text frames as handled only with JSON which already requires UTF-8

* fix: move DB setup to happen after Connect.init (supabase#1533)

This change reduces the impact of slow DB setup impacting other tenants
trying to connect at the same time that landed on the same partition

* fix: handle wal bloat (supabase#1528)

Verify that replication connection is able to reconnect when faced with WAL bloat issues

* feat: replay realtime.messages (supabase#1526)

A new index was created on inserted_at DESC, topic WHERE private IS TRUE AND extension = "broadast"

The hardcoded limit is 25 for now.

* feat: gen_rpc pub sub adapter (supabase#1529)

Add a PubSub adapter that uses gen_rpc to send messages to other nodes.

It uses :gen_rpc.abcast/3 instead of :erlang.send/2

The adapter works very similarly to the PG2 adapter. It consists of
multiple workers that forward to the local node using PubSub.local_broadcast.

The way to choose the worker to be used is based on the sending process
just like PG2 adapter does

The number of workers is controlled by `:pool_size` or `:broadcast_pool_size`.
This distinction exists because Phoenix.PubSub uses `:pool_size` to
define how many partitions the PubSub registry will use. It's possible
to control them separately by using `:broadcast_pool_size`

* fix: ensure message id doesn't raise on non-map payloads (supabase#1534)

* fix: match error on Connect (supabase#1536)



---------

Co-authored-by: Eduardo Gurgel Pinho <eduardo.gurgel@supabase.io>

* feat: websocket max heap size configuration (supabase#1538)

* fix: set max process heap size to 500MB instead of 8GB
* feat: set websocket transport max heap size

WEBSOCKET_MAX_HEAP_SIZE can be used to configure it

* fix: update gen_rpc to fix gen_rpc_dispatcher issues (supabase#1537)

Issues:

* Single gen_rpc_dispatcher that can be a bottleneck if the connecting takes some time
* Many calls can land on the dispatcher but the node might be gone already. If we don't validate the node it might keep trying to connect until it times out instead of quickly giving up due to not being an actively connected node.

* fix: improve ErlSysMon logging for processes (supabase#1540)

Include initial_call, ancestors, registered_name, message_queue_len and total_heap_size

Also bump long_schedule and long_gc

* fix: make pubsub adapter configurable (supabase#1539)

* fix: specify that only private channels are allowed when replaying (supabase#1543)

messages

* fix: rate limit connect module (supabase#1541)

On bad connection, we rate limit the Connect module so we prevent abuses and too much logging of errors

* build: automatically cancel old tests/build on new push (supabase#1545)

Currently, whenever you push any commit to your branch, the old builds are still running and a new build is started. Once a new commit is added, the old test results no longer matter and it's just a waste of CI resources. Also reduces confusion with multiple builds running in parallel for the same branch/possibly blocking any merges.

With this little change, we ensure that whenever a new commit is added, the previous build is immediately canceled/stopped and only the build (latest commit) runs.

* fix: move message queue data to off-heap for gen_rpc pub sub workers (supabase#1548)

* fix: rate limit Connect.lookup_or_start_connection on error only (supabase#1549)

* fix: increase connect error rate window to 30 seconds (supabase#1550)

* fix: set a lower fullsweep_after flag for GenRpcPubSub workers (supabase#1551)

* fix: hardcode presence limit (supabase#1552)

* fix: further decrease limit on presence events (supabase#1553)

* fix: bump up realtime (supabase#1554)

* fix: lower rate limit to 100 events per second (supabase#1556)

* fix: move connect rate limit to socket (supabase#1555)

* fix: reduce max_frame_size to 5MB
* fix: fullsweep_after=100 on gen rpc pub sub workers

---------

Co-authored-by: Eduardo Gurgel Pinho <eduardo.gurgel@supabase.io>

* fix: collect global metrics without tenant tagging (supabase#1557)

* feat: presence payload size (supabase#1559)

* Also tweak buckets to account all the way to 3000KB
* Start tagging the payload size metrics with message_type. message_type can be presence, broadcast or postgres_changes

* fix: use GenRpc for Realtime.Latency pings (supabase#1560)

* Fastlane for phoenix presence_diff (supabase#1558)

It uses a fork of Phoenix for time being

* fix: count presence_diff events on MessageDispatcher
* fix: remove traces from console during development

* fix: limit db events (supabase#1562)

* chore: split tests and lint workflows (supabase#1564)

Also cache mix _build and deps

* fix: use LiveView stream for status page (supabase#1565)

* fix: use LiveView stream for status page

* fix: need full node name on localhost for tests

* fix: cleanup

* fix: add tests

* fix: bump version

* fix: cleanup syntax

* fix: format

* fix: refine join payload checking (supabase#1567)

* fix: shard user scopes in syn (supabase#1566)

---------

Co-authored-by: Filipe Cabaço <filipe@supabase.io>
Co-authored-by: Eduardo Gurgel <eduardo.gurgel@supabase.io>
Co-authored-by: Kevin Grüneberg <k.grueneberg1994@gmail.com>
Co-authored-by: Chase Granberry <chase@logflare.app>
Co-authored-by: Bradley Haljendi <5642609+Fudster@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants