Skip to content

Add UserConnTracker to manage user connection cancellations across proxies#5844

Open
ImMohammad20000 wants to merge 15 commits into
XTLS:mainfrom
ImMohammad20000:UserConnTracker
Open

Add UserConnTracker to manage user connection cancellations across proxies#5844
ImMohammad20000 wants to merge 15 commits into
XTLS:mainfrom
ImMohammad20000:UserConnTracker

Conversation

@ImMohammad20000
Copy link
Copy Markdown

@ImMohammad20000 ImMohammad20000 commented Mar 25, 2026

there is an old bug that lead to a big issue in panels that leve connections open when users remove from inbounds via gRPC api the connection to that user dose not close in some cases so i implement a connection tracker that track all connections and close theme when users remove from the inbounds

an example of the issue is user limited to 1GB but becouse connections not closed it used 18GB

image

@RPRX
Copy link
Copy Markdown
Member

RPRX commented Mar 26, 2026

#5401 连接追踪是会加的,不过怎么设计还要再考虑下

我看了下你的代码,为啥要 register unregister,每个 user 内存中自带一个 ctx 然后被 remove 时直接 cancel() 不就行了

应该不用考虑整个入站被移除?那个连端口都被释放了

@ImMohammad20000
Copy link
Copy Markdown
Author

ImMohammad20000 commented Mar 26, 2026

I considered that approach, but I wanted to ensure connections are tracked across all proxies without relying on each inbound’s internal context, especially since the removal comes via gRPC API and the user might be referenced from multiple places. Using a separate tracker makes it explicit and easier to debug. Happy to refactor if you think a simpler ctx-based approach inside each user struct.

@Fangliding
Copy link
Copy Markdown
Member

这不就是个机场计费 helper

@RPRX
Copy link
Copy Markdown
Member

RPRX commented Mar 27, 2026

等新的正式版发布后再讨论一下吧

@M03ED
Copy link
Copy Markdown
Contributor

M03ED commented Mar 27, 2026

这不就是个机场计费 helper

Another pr and again you're trying to make people work looks like useless.
I wonder when this is going to end...

@RPRX
Copy link
Copy Markdown
Member

RPRX commented Mar 29, 2026

这个加了后理论上能按 email 获取到 user 的所有活跃连接吧?突然想到 online map

@ImMohammad20000
Copy link
Copy Markdown
Author

ImMohammad20000 commented Mar 29, 2026

With this added, theoretically, it should be possible to retrieve all active connections for a user by email, right? It suddenly reminded me of online maps.

yes you can do that too by counting the cancel functions

@RPRX
Copy link
Copy Markdown
Member

RPRX commented Mar 29, 2026

在我看来 *ray 缺的一个重要功能是连接级相关的 API #5722 (comment)

*ray 的 GUI 客户端都没有像 Clash/Mihomo 那样有显示哪些连接正在活跃、用了多少流量等 #5401

仅这个 PR 的话也可以接受,但如果一步到位把上述功能给实现了当然更好,@ImMohammad20000 有兴趣吗

@Fangliding
Copy link
Copy Markdown
Member

这不就是个机场计费 helper

Another pr and again you're trying to make people work looks like useless. I wonder when this is going to end...

AI时代代码又不值钱 AI堆垃圾反而负收益 我对这种只是为了售卖节点的功能一直不喜欢 只能说你们一直提交这样的pr

以后真有人来挖掘史山的时候会发现这里有一个看起来很高级的什么 conntrack 模块 结果翻一下调用唯一的作用就是为了在 remove user 的时候触发一个回调

@RPRX
Copy link
Copy Markdown
Member

RPRX commented Mar 29, 2026

只能说“机场后端”的功能需求从 v2ray 时代开始就一直就有吧,只是中国人喜欢弄个闭源机场后端卖钱,外国人还更开源些

堵不如疏吧,把一个仅机场能用的功能扩展为对 Xray 自身整体的完善,这个 PR 如果原样合并的话后续肯定也会被替代的

@ImMohammad20000
Copy link
Copy Markdown
Author

@RPRX I added a new package for connectiontracker also implemented some new apis ListConnections, CloseConnection, GetUserStats, StreamConnections that help track the connectiones and get information from them

@Fangliding
Copy link
Copy Markdown
Member

全是AI写的你自己检查过么。。。

@ImMohammad20000
Copy link
Copy Markdown
Author

Its not all by ai and yes i checked the code before i push it
@Fangliding

@kastov
Copy link
Copy Markdown
Contributor

kastov commented Mar 30, 2026

These changes will definitely cause panics in high-load situations. Is it possible to make such significant changes optional? don't want to risk.

@ImMohammad20000
Copy link
Copy Markdown
Author

ImMohammad20000 commented Mar 30, 2026

These changes will definitely cause panics in high-load situations. Is it possible to make such significant changes optional? don't want to risk.

My goal was a simple tracker that close the open connections And my second commit just added an api to the original disine
I dont think this create any big issue in the app

@wyx2685
Copy link
Copy Markdown
Contributor

wyx2685 commented Mar 30, 2026

这些奇奇怪怪的东西一定要全部糊进core里面吗

@Fangliding
Copy link
Copy Markdown
Member

Its not all by ai and yes i checked the code before i push it @Fangliding

那你可以解释为什么 WrapConn 根本没地方调用吗 RegisterWithMeta 的第二个返回值根本没任何人使用

@ImMohammad20000
Copy link
Copy Markdown
Author

Any plan for this?
@RPRX

@RPRX
Copy link
Copy Markdown
Member

RPRX commented Apr 12, 2026

我个人觉得这个功能是可以有的,但对其他人来说这么多代码目前只实现了这一个功能的话争议过大

Rebase 一下,顺便实现 #5401 做成一整套系统吧,会合并

@Meo597
Copy link
Copy Markdown
Collaborator

Meo597 commented Apr 12, 2026

如果 vibe 含量过高,我觉得还是谨慎点好

@Avaritia-ZOV
Copy link
Copy Markdown

The implementation of connection tracking is rather messy. I agree that this is a necessary feature for a number of use cases. However, xray-core will never be able to fully implement this functionality for the other purposes connection tracking is usually used for. This amount of code, embedded into nearly every part of the codebase, is overkill created for a single task only: terminating user connections. As my dear BRAT kastov said, this functionality can be implemented independently without any particular issues. I will not comment on the quality of the code itself; I think everyone here understands that this is far from the best or most reliable solution.

Nevertheless, since the code is already here, a few words about the specific implementation issues:

  • Broken UDP traffic accounting: WritePacket reads buffer.Len() after calling the inner writer, which consumes the buffer - so the UDP downlink counter is always 0.
  • Lost disconnect events: CancelAll removes entries from the map before the deferred Unregister gets a chance to emit a disconnect event - subscribers to StreamConnections will never learn about the disconnection. Precisely in the very scenario this entire PR was written for.
  • Panic on double Unsubscribe: close(ch) is called unconditionally, so a second call causes a panic.
  • Bypassing the DI system: all of xray-core uses features.Feature + core.RequireFeatures for lifecycle management. The tracker uses package-level globals, breaking the application lifecycle and making proper isolation and cleanup impossible.

And that’s not all I found.

@RPRX
Copy link
Copy Markdown
Member

RPRX commented Apr 12, 2026

我觉得虽然有些功能可以由外部程序调用 Xray API 加一些逻辑来实现,但若是非小众需求且是优雅的一整套是可以加进来的

@ImMohammad20000
Copy link
Copy Markdown
Author

ImMohammad20000 commented Apr 12, 2026

I will push the bug fix for this than i open another pr for #5401 because the changes is geting too much for this pr

nevermind

@ImMohammad20000
Copy link
Copy Markdown
Author

Its done @RPRX

@M03ED
Copy link
Copy Markdown
Contributor

M03ED commented Apr 23, 2026

should we expect to see this in next version ?

@Meo597
Copy link
Copy Markdown
Collaborator

Meo597 commented Apr 23, 2026

我个人的底线是,如果它是默认关闭,并且保证不会对性能造成任何影响

以及仅凭我个人 vibe 经验
ai 写的代码毫无美感,需要逐行调校

显然此 PR 不是这么干的

@M03ED
Copy link
Copy Markdown
Contributor

M03ED commented Apr 23, 2026

我个人的底线是,如果它是默认关闭,并且保证不会对性能造成任何影响

以及仅凭我个人 vibe 经验 ai 写的代码毫无美感,需要逐行调校

显然此 PR 不是这么干的

original pr was much smaller, @RPRX himself asked for api and other options

@M03ED
Copy link
Copy Markdown
Contributor

M03ED commented Apr 25, 2026

既然社区在期待新版 Xray 并且不太信任这个 PR,等发了 stable release 之后再合这个吧,下个版本肯定有

its already done and fixed
just a small conflict with recent changes
also this guy is just a fake account, i know the real imtheted and his in sweet dream right now
Joined 20 minutes ago

@imtheted
Copy link
Copy Markdown

Just a coincidence man, calm down

@RPRX
Copy link
Copy Markdown
Member

RPRX commented Apr 25, 2026

只能说实际上这个功能不急,下个版本前我仔细审一下这个 PR 吧

@iamtheted
Copy link
Copy Markdown

Just a coincidence man, calm down

@imtheted This is a fake account!
You must be so miserable spending time creating fake accounts on GitHub.
Get a life bruh...

@M03ED
Copy link
Copy Markdown
Contributor

M03ED commented Apr 27, 2026

some asked for performance and memory usage of this pr.
this is one of instances have been running for more than 96 hour without any problem
more than 10k user, 2K online (more or less in different periods) and rx/tx ≈ 230 Mbps (on single instance)
photo_2026-04-27_20-34-55

…n management

- Add protobuf definitions for connection tracking commands including ListConnections, CloseConnection, GetUserStats, and StreamConnections.
- Generate gRPC client and server code from the protobuf definitions.
- Implement a connection tracker that maintains active connections, allows forced disconnections, and provides real-time statistics.
- Create API commands for listing connections, closing connections, streaming connection events, and retrieving user statistics.
- Add tests for connection tracking functionality to ensure correctness and thread safety.
@nightcrawler42
Copy link
Copy Markdown

Great work on this — I was working on the same problem independently (#6102, now closed as duplicate of this PR).

One concern after testing in production: CancelAll() only calls entry.Cancel() (context cancellation), but this is not sufficient to kill long-lived streams.

The root cause is that buf.Copy in common/buf/copy.go has no context awareness — it loops on reader.ReadMultiBuffer() indefinitely and only exits on read/write errors. Context cancellation propagates through task.Run, but the goroutines running buf.Copy inside requestDone/responseDone continue to run because they're blocked on raw I/O, not on ctx.Done().

This is particularly visible with tools like Psiphon that maintain persistent tunneled connections over WebSocket. In our production environment (Marzban panel + multiple nodes), we observed users consuming 2+ GB on a 30 MB quota because the connection survived RemoveUser entirely.

The fix that worked for us was storing the underlying net.Conn (io.Closer) alongside the cancel func, and calling conn.Close() in addition to cancel() during CancelAll. Closing the TCP connection forces an immediate read/write error that breaks all in-flight buf.Copy loops:

func (t *Tracker) CancelAll(email string) {
    // ...
    for id, entry := range entries {
        entry.Cancel()
        if entry.Closer != nil {
            entry.Closer.Close()  // force TCP RST
        }
    }
}

Since WrapConn already wraps stat.Connection, this could be integrated by storing the connection reference in ConnEntry and closing it during cancellation.

Would be happy to contribute this addition if helpful.

@MR-MKZ
Copy link
Copy Markdown

MR-MKZ commented May 12, 2026

Thank's Mohammad for your BUG fix pr.

When this PR will merge to the xray-core?

@AliCl0ner
Copy link
Copy Markdown

Hope this PR merge sooner.

@amirhosseinds
Copy link
Copy Markdown

Hi, I rebased this PR locally on top of current main (1bdb488c, v26.5.9) to evaluate it for a production rollout. Sharing findings below in case they're useful.

1. Compile errors after rebase

testing/scenarios/command_test.go no longer compiles against current main because it predates commit 1dbafe62 ("Config: Rename network/address/port in Tunnel inbound and DNS outbound"):

testing/scenarios/command_test.go:559: unknown field Address in struct literal of type dokodemo.Config
testing/scenarios/command_test.go:560: unknown field Port in struct literal of type dokodemo.Config
testing/scenarios/command_test.go:561: unknown field Networks in struct literal of type dokodemo.Config
testing/scenarios/command_test.go:568: unknown field IpsBlocked in struct literal of type freedom.Config
testing/scenarios/command_test.go:568: undefined: freedom.IPRules
testing/scenarios/command_test.go:595-597: (same as 559-561)

Fix is mechanical:

  • AddressRewriteAddress
  • PortRewritePort
  • NetworksAllowedNetworks
  • IpsBlocked: &freedom.IPRules{}FinalRules: []*freedom.FinalRuleConfig{{Action: freedom.RuleAction_Allow}}

The existing 8a07f083 fix: sync with latest changes commit doesn't catch these. Worth one more sync commit before merge.

2. Production safety concerns

After detailed read of app/connectiontracker/{tracker,access,service}.go and the proxy integrations, ordered by severity:

Critical

Connection ID is uint32 and wraps under loadtracker.go:227 uses atomic.AddUint32 for globalNext. At 10k conn/s it wraps in ~5 days; at 100k conn/s in ~12 hours. After wraparound, byID[id] = entry silently overwrites a live entry, breaking accounting and orphaning the previous entry until process restart. Suggest changing to uint64 — trivial change, large risk reduction.

High

ListConnections and GetUserStats hold t.mu during full map iteration + allocation (tracker.go:181-193, tracker.go:317-328). Same mutex is acquired by RegisterWithMeta / Unregister. On a node with 50k+ active connections, a single gRPC call can stall the accept path for ~10ms. Suggest snapshotting byID to a slice under the lock, then building ConnectionInfo outside.

Hot-path overhead is unconditional and double-stacked. Every TCP Read/Write and UDP ReadPacket/WritePacket now executes time.Now().UnixNano() + atomic.AddInt64 + atomic.StoreInt64, plus the access-record layer in access.go:50-61 does the same again, plus the pre-existing CounterConnection. For high-pps UDP/QUIC workloads this adds up. A build tag or per-inbound flag to disable tracking when unused would help adopters with packet-heavy traffic.

Medium

Zero test coverage for the Subscribe/Unsubscribe/emit path (tracker.go:99-134). The 9a8830af fix: Unsubscribe panic commit suggests this path has had real bugs. The single non-blocking-send guard is good, but the swap-with-last + truncate in Unsubscribe interacting with a concurrent emit() (which copies the slice under subMu then iterates outside the lock) deserves an explicit race test. Would also catch:

  • tracker.go:114-120: the swapped-from chan pointer is leaked in the slice's backing array past len but within cap until the slice grows or Manager closes.
  • tracker.go:123-134: emit allocates a fresh snapshot slice per event — at high churn, GC pressure.

Coverage inconsistency. proxy/http/server.go, proxy/dokodemo/dokodemo.go, and proxy/wireguard/server.go install only the access-record wrapper, no RegisterWithMeta. So CloseConnection and ListConnections gRPC calls silently do nothing for those inbounds. Either document this or extend the integration.

UDP NAT lifetime. In shadowsocks_2022/inbound{,_multi,_relay}.go, every NAT mapping registers a ConnEntry and the matching defer Unregister only fires when the packet conn returns (idle timeout). Map growth is bounded by NAT timeout but has no hard cap; misbehaving clients can inflate the tracker map. Suggest a per-user or global cap.

Low

Type-assertion regressions. TrackedConn over stat.Connection and TrackedAccessReader over buf.Reader may hide downstream optimizations that type-assert to specific impls (e.g., splice path, buf.BufferToBytesReader). Worth benchmarking throughput on a vless+xtls path before/after to make sure no zero-copy path silently regressed.

3. What's solid

  • All RegisterWithMeta calls in modified proxies are paired with defer Unregister in the same scope — no leak on normal return, panic, or context cancel.
  • Subscriber sends are non-blocking with default: drop — slow gRPC clients can't block the registration path. Buffer size 64 is reasonable.
  • completeAccessRecord uses atomic.Bool.CompareAndSwap to prevent double-completion.
  • WritePacket correctly captures buffer.Len() before the call consumes the buffer (tracker.go:402) — this was the 4d93af68 fix: Broken UDP traffic accounting fix, well done.
  • The entry != nil check in Unregister (tracker.go:258) correctly prevents the double-emit race against CancelAll.
  • No per-connection or per-subscriber goroutines spawned beyond what the gRPC server already creates.

Suggested minimum changes before merge

  1. Rebase + fix command_test.go compile errors (5 minutes).
  2. Widen connection ID to uint64 (1 line + proto regen).
  3. Move ConnectionInfo allocation outside t.mu in ListConnections / GetUserStats (~10 lines).
  4. Add a concurrency test for Subscribe/Unsubscribe/emit.

Items 1–3 are mechanical. Item 4 is the one I'd be most nervous about shipping without.

Happy to open follow-up PRs for any of these if helpful. Thanks for the feature — the API surface is clean and the integration with the existing stat/counter machinery is well thought out.

@bytecategory
Copy link
Copy Markdown
Contributor

这些奇奇怪怪的东西一定要全部糊进core里面吗

作用完全是让第三方面板更好 比如

func toProto(c connectiontracker.ConnectionInfo) *ConnInfo {
	return &ConnInfo{
		Id:           c.ID,
		Email:        c.Email,
		InboundTag:   c.InboundTag,
		Protocol:     c.Protocol,
		StartTime:    c.StartTime.Unix(),
		LastActivity: c.LastActivity.Unix(),
		Uplink:       c.Uplink,
		Downlink:     c.Downlink,
	}
}

里有这么多信息比如上下行,而且还是实时的,不用轮询.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.