Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsyncHooksContextManager: async function called in with is returning wrong context after await #1101

Closed
blumamir opened this issue May 24, 2020 · 4 comments · Fixed by #1099
Assignees
Labels
bug Something isn't working

Comments

@blumamir
Copy link
Member

blumamir commented May 24, 2020

What version of OpenTelemetry are you using?

v0.8.2 (the issue is new from v0.8.0), confirmed to work properly on v0.7.0

What version of Node are you using?

v.13.11.0

What did you do?

Started a new span after await in an http server callback:

import { SimpleSpanProcessor } from '@opentelemetry/tracing';
import { NodeTracerProvider } from '@opentelemetry/node';
import { ZipkinExporter } from '@opentelemetry/exporter-zipkin';

const serviceName = 'ot-await-debug';

const provider = new NodeTracerProvider({
    plugins: {
        http: {
            enabled: true,
            path: '@opentelemetry/plugin-http',    
        },
    }
});

const zipkinExporter = new ZipkinExporter({
    serviceName,
    url: 'http://localhost:9411/api/v2/spans'
});

provider.addSpanProcessor(new SimpleSpanProcessor(zipkinExporter));
provider.register();
const tracer = provider.getTracer(serviceName);

import * as http from 'http';
http.createServer(async (req, res) => {
    await (async () => {})(); // <==============
    const childSpan = tracer.startSpan("child span");
    childSpan.end();
    res.end();
}).listen(8090);

What did you expect to see?

The new span "child span" being child of the http span, and both belong to the same trace (have the same traceId)

What did you see instead?

Two spans which are not linked, with different traceId
image
The code above is a minimal example I wrote to show the issue. The real case has complex express routing on top of the http server, having various async middlewares, which await operations like DB access and outgoing http requests to other services. After upgrading to v0.8.2, I get a pile of root spans with no correlation between them.

Additional context

I traced the issue to new Reference class added to AsyncHooksContextManager.ts in v0.8.0.
Http plugin is calling the context manager with(context, fn) function twice:

  1. when extracting the propagation context from the http headers here. This will create a new Reference for the context.
  2. when starting a new span for http here. This operation will set the context to the same ref created in 1.

Following async operations spawned from the http span will all point to the same Reference which is set correctly to context 2, but once http synchronised callback returns, context 1 is restored here, which will now affect all the ongoing async operations, falsely setting them to the wrong context (empty context in the above example).

An easy fix will be to store the real context for each async execution (same as v0.7.0) instead of setting all of them to the same reference, but I assume the Reference approach was added for a good reason.

I wrote a minimal failing unit test which reproduce the issue:

    it('async function called from nested "with" sync function should return nested context', done => {
      const scope1 = '1' as any;
      const scope2 = '2' as any;

      const asyncFuncCalledDownstreamFromSync = async () => { 
        await ( async () => {})();
        assert.strictEqual(contextManager.active(), scope2);
        return done();
      }

      contextManager.with(scope1, () => {
        contextManager.with(scope2, () => asyncFuncCalledDownstreamFromSync() );
      })
    })

Since asyncFuncCalledDownstreamFromSync is called from within with(scope2, ...), I would expect the active context to be scope2, but got: handledPromiseRejectionWarning: AssertionError [ERR_ASSERTION]: Expected values to be strictly equal: '1' !== '2'

@blumamir blumamir added the bug Something isn't working label May 24, 2020
@blumamir blumamir changed the title AsyncHooksContetxManager returning wrong context after original synchronised function returns AsyncHooksContextManager: async function called in with is returning wrong context after await May 24, 2020
@vmarchaud
Copy link
Member

vmarchaud commented May 26, 2020

Its linked to both #1013 and #752, i'm currently working on a refactor (#1099) that fixes all threes issues. In the mean time, you could rollback to the previous version if it works for you:

import { SimpleSpanProcessor } from '@opentelemetry/tracing';
import { NodeTracerProvider } from '@opentelemetry/node';
import { ZipkinExporter } from '@opentelemetry/exporter-zipkin';
// you need to install in your package.json the fixed version, in your case that should be 0.7.0
// npm install @open-telemetry/context-async-hooks@0.7.0
import { AsyncHooksContextManager } from '@opentelemetry/context-async-hooks'

const serviceName = 'ot-await-debug';

const provider = new NodeTracerProvider({
    plugins: {
        http: {
            enabled: true,
            path: '@opentelemetry/plugin-http',    
        },
    },
});

const zipkinExporter = new ZipkinExporter({
    serviceName,
    url: 'http://localhost:9411/api/v2/spans'
});

provider.addSpanProcessor(new SimpleSpanProcessor(zipkinExporter));
// create the instance and give it to register()
const contextManager = new AsyncHooksContextManager()
contextManager.enable()
provider.register({
  contextManager
});
const tracer = provider.getTracer(serviceName);

import * as http from 'http';
http.createServer(async (req, res) => {
    await (async () => {})(); // <==============
    const childSpan = tracer.startSpan("child span");
    childSpan.end();
    res.end();
}).listen(8090);

@vmarchaud vmarchaud self-assigned this May 26, 2020
@blumamir
Copy link
Member Author

Thanks @vmarchaud for taking the time to add a solution and for working on a fix.
In the mean time I will roll back to v0.7.0 like you suggested

@obecny
Copy link
Member

obecny commented May 28, 2020

@vmarchaud don't get me wrong but it looks now like a stack scope manager, did you remove async hooks ?, I wonder what will happen if you create couple few async operations that will be running at the same time together with setTimeout.
For example

  1. Start ...................................... End
  2. ........ Start ........... End
  3. ...... Start ............................................. End

and each of the operation should have some more async and sync operation as well

Will this work fine then too ?

@vmarchaud
Copy link
Member

@obecny I did not remove async hooks and indeed it looks like the stack scope manager. Each asnyc operation has its context restored (because we use async hooks) before calling the userland code so it should works. I will try to make a test case for that.
PS: I think we should move this dicussion to the PR though

dyladan pushed a commit that referenced this issue May 28, 2020
This addresses #1101 #752, and #1013

- Fix context loss when used in some async cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants