Skip to content

Optimize event handling performance: move GetEventFields outside lock in QueueEvent#3596

Draft
Copilot wants to merge 3 commits intomasterfrom
copilot/fix-opc-ua-server-unresponsiveness
Draft

Optimize event handling performance: move GetEventFields outside lock in QueueEvent#3596
Copilot wants to merge 3 commits intomasterfrom
copilot/fix-opc-ua-server-unresponsiveness

Conversation

Copy link
Contributor

Copilot AI commented Mar 6, 2026

Server becomes unresponsive under load (~7000 monitored items) because MonitoredItem.QueueEvent holds m_lock while executing GetEventFields — an expensive node-hierarchy traversal that calls GetAttributeValue per select clause. With many concurrent subscriptions, threads pile up waiting for the lock.

Changes

Libraries/Opc.Ua.Server/Subscription/MonitoredItem/MonitoredItem.cs

Refactored QueueEvent(IFilterTarget, bool) to a double-lock pattern:

// Lock 1: fast checks + filter eval (must stay locked for m_filteredRetainConditionIds)
lock (m_lock)
{
    // null/duplicate/overflow checks, capture filter + build FilterContext
    if (!bypassFilter && !CanSendFilteredAlarm(context, filter, instance)) return;
}

// OUTSIDE lock: expensive node-hierarchy traversal, now concurrent-safe
EventFieldList fields = GetEventFields(context, filter, instance);  // reuses context from above

// Lock 2: re-check null/overflow, enqueue
lock (m_lock)
{
    if (m_eventQueueHandler == null || m_eventQueueHandler.SetQueueOverflowIfFull()) return;
    m_eventQueueHandler.QueueEvent(fields);
    m_readyToPublish = m_readyToTrigger = true;
}
  • CanSendFilteredAlarm stays in the first lock because it mutates m_filteredRetainConditionIds
  • FilterContext is created once inside the first lock (capturing session locales atomically) and reused for GetEventFields
  • Second lock re-validates queue state to guard against changes between the two sections

Tests/Opc.Ua.Client.Tests/LoadTest.cs

Added ServerEventSubscribeLoadTestAsync (explicit, [Order(120)]) — 10 sessions × 5 subscriptions monitoring ObjectIds.Server, 500 events generated via ReferenceServer.CurrentInstance.ReportEvent(), asserts ≥ 99% notification delivery ratio and measures throughput (events/sec).

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • Enhancement (non-breaking change which adds functionality)
  • Test enhancement (non-breaking change to increase test coverage)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected, requires version increase of Nuget packages)
  • Documentation Update (if none of the other choices apply)

Checklist

  • I have read the CONTRIBUTING doc.
  • I have signed the CLA.
  • I ran tests locally with my changes, all passed.
  • I fixed all failing tests in the CI pipelines.
  • I fixed all introduced issues with CodeQL and LGTM.
  • I have added tests that prove my fix is effective or that my feature works and increased code coverage.
  • I have added necessary documentation (if appropriate).
  • Any dependent changes have been merged and published in downstream modules.

Further comments

The core issue is that GetEventFields scales linearly with (select clauses) × (subscriptions per node). Under load this dominates lock hold time and starves other threads. Moving it outside the lock lets multiple subscriptions read event fields concurrently rather than serially. The CanSendFilteredAlarm / m_filteredRetainConditionIds concern was the main constraint preventing a straightforward unlock-and-read approach.

Original prompt

This section details on the original issue you should resolve

<issue_title>opc ua server unresponsive as the number of monitored items in the server increase.</issue_title>
<issue_description>### Type of issue

  • Bug
  • Enhancement
  • Compliance
  • Question
  • Help wanted

Current Behavior

We have built an OPC UA Server which receives data from an external system. The external system raises events with data values at regular intervals. when the data is received the following code is executed foreach parameter -

node.Value = readResult.Value?.Value;
node.StatusCode = (uint)readResult.StatusCode;
node.Timestamp = time;
node.ClearChangeMasks(SystemContext, false);

As the number of parameters being monitored increases (around 7000) the server becomes unresponsive. This is observed consistently on a specific machine and not so frequently on others. Prelimnary investigation of the dump file shows many threads stuck with the below callstack -

System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef)
Opc.Ua.Server.MonitoredNode2.OnMonitoredNodeChanged(Opc.Ua.ISystemContext, Opc.Ua.NodeState, Opc.Ua.NodeStateChangeMasks)
Opc.Ua.NodeState.ClearChangeMasks(Opc.Ua.ISystemContext, Boolean)

Expected Behavior

No response

Steps To Reproduce

No response

Environment

- OS: Win 10
- Environment: 
- Runtime: .Net6
- Nuget Version: 1.5.372.113
- Component: OPC.Ua.Server
- Server: 
- Client: UA Expert

Anything else?

No response</issue_description>

<agent_instructions>Please extend the existing LoadTest present in the Client Test project with a new Test that verifies event performance.
Then use your Profiler Agent to improve the performance of the Event Handling in the Server</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@romanett @vinaybr Maybe you can optimize the Read Process of the EventFields to improve the runtime of the MonitoredNode2.OnMonitoredNodeChanged method. @marcschier @romanett is the async server going to fix this? @romanett @marcschier No Not yet this will only improve Method calls in the First place. Extension of the subsrciption Handling needs to be done separately

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copilot AI and others added 2 commits March 6, 2026 16:19
…ce load test

Co-authored-by: romanett <7413710+romanett@users.noreply.github.com>
…ublish wait

Co-authored-by: romanett <7413710+romanett@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix OPC UA server unresponsiveness with increased monitored items Optimize event handling performance: move GetEventFields outside lock in QueueEvent Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

opc ua server unresponsive as the number of monitored items in the server increase.

3 participants