Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace Ids and span parent Ids are getting messed up on concurrent request of around 20 (rest based services) #1646

Open
sureshballa opened this issue Nov 2, 2020 · 15 comments
Labels
bug Something isn't working needs:reproducer This bug/feature is in need of a minimal reproducer

Comments

@sureshballa
Copy link

sureshballa commented Nov 2, 2020

Please answer these questions before submitting a bug report.

What version of OpenTelemetry are you using?

"@opentelemetry/api": "^0.11.0",
"@opentelemetry/exporter-prometheus": "^0.11.0",
"@opentelemetry/metrics": "^0.11.0",
"@opentelemetry/node": "^0.11.0",
"@opentelemetry/plugin-express": "^0.10.0",
"@opentelemetry/plugin-http": "^0.11.0",
"@opentelemetry/plugin-mongodb": "^0.10.0",
"@opentelemetry/semantic-conventions": "^0.11.0",
"@opentelemetry/tracing": "^0.11.0",

What version of Node are you using?

10.21.0

Please provide the code you used to setup the OpenTelemetry SDK

const packageJSON = require("../package.json");

import { LogLevel } from "@opentelemetry/core";
import { NodeTracerProvider } from "@opentelemetry/node";
import { BatchSpanProcessor } from "@opentelemetry/tracing";
import { AzureMonitorTraceExporter } from "@azure/monitor-opentelemetry-exporter";

// Use your existing provider
const provider = new NodeTracerProvider({
    logLevel: LogLevel.DEBUG,
    plugins: {
        http: {
            enabled: true,
            // You may use a package name or absolute path to the file.
            path: "@opentelemetry/plugin-http",
            ignoreUrls: [new RegExp(/login.microsoftonline.com/i), new RegExp(/dc.visualstudio.com/i), new RegExp(/dc.services.visualstudio.com/i)]
        },
        https: {
            enabled: true,
            // You may use a package name or absolute path to the file.
            path: "@opentelemetry/plugin-https",
            ignoreUrls: [new RegExp(/dc.visualstudio.com/i), new RegExp(/login.microsoftonline.com/i), new RegExp(/dc.services.visualstudio.com/i)]
        }
    },
});

// Initialize the provider
provider.register();

// Create an exporter instance
const exporter = new AzureMonitorTraceExporter({
    logger: provider.logger,
    instrumentationKey: process.env.APP_INSIGHTS_INSTRUMENTATION_KEY
});

exporter.addTelemetryProcessor((envelope) => {
    // eslint-disable-next-line no-param-reassign
    envelope.tags["ai.cloud.role"] = packageJSON.name;
    // envelope.tags["ai.cloud.roleInstance"] = "<your role instance>";
});

// Add the exporter to the provider
provider.addSpanProcessor(new BatchSpanProcessor(exporter as any, {
    bufferTimeout: 15000,
    bufferSize: 1000,
}));

//Only for debugging purposes
provider.addSpanProcessor(new SimpleSpanProcessor(new ConsoleSpanExporter()));

What did you do?

Our microservices are exposed with REST protocol. We are emitting the trace id as response header for each API request (though trace id is not available as typed property, but I am using this code to get the value - spanContext.traceId.

Based on this response header, I am looking in backend app insights portal to look for all things happening for this specific trace id. This all works good if I send a single http request, and search the app insights with trace id. But if I send concurrent requests of say 20, then things are getting messed up. All the actions related to 2nd request till 20th request are getting related to 1st request trace id. So from app insights it looks like 1st request is handling

To confirm this is not issue with azure app insights exporter, I have enabled ConsoleSpanProcessor and I can see parent id are wrongly linked to other trace id. To be precise, all the span ids for 2nd request to 20th request, the parent ids are set to trace id of the very first request.

Below is the express middleware I am using to send back the trace id so that I can use to trace all things:

export function contextMiddleware(req: Request, res: Response, next: () => void) {
    const currentContext = container.get<IContextManager>(TYPES.CurrentContext);
    let context: Context = new Context(
        req.headers['x-user-email'] as string || `simulated email is ${new UniqueEntityID()}`,
        req.headers['x-tenant-id'] as string || `simulated tenant id is ${new UniqueEntityID()}`,
        req.headers['x-correlation-id'] as string || `simulated correlation id is ${new UniqueEntityID()}`,
        `HTTP ${req.method} ${req.url}`, '', req.headers.authorization);
    
    res.setHeader('x-user-email', context.userEmail);
    res.setHeader('x-tenant-id', context.tenantID);
    res.setHeader('x-correlation-id', context.correlationID);

    const tracer: Tracer = trace.getTracer("hexagonal");
    const httpRequestSpan = tracer.startSpan(`HTTP ${req.method} ${req.url}`);

    tracer.withSpan(httpRequestSpan, () => {
        res.setHeader('x-trace-id', (httpRequestSpan as any).spanContext.traceId);
        currentContext.runInNewContext(() => {
            next();
        }, context);
    });
}
@sureshballa sureshballa added the bug Something isn't working label Nov 2, 2020
@dyladan
Copy link
Member

dyladan commented Nov 2, 2020

I'm not familiar with app insights. I assume tenant id and correlation id are app insights concepts? The code here looks correct at first pass. Are you sure this is not a problem with app insights? The response header use is not standardized. Can you link to the documentation for it?

@sureshballa
Copy link
Author

sureshballa commented Nov 2, 2020

@dyladan please ignore the code around tenant id and correlation id, its in house concepts we have. What I am reporting here is, for all spans corresponding to 2nd HTTP request and till 20th HTTP request, the parent id is linked to the first HTTP request trace id. This is evident from ConsoleSpanExporter where I can see parent ids and span ids etc. So its not problem from azure app insights exporter.

The reason I shared the middleware code is to show how I am grabbing the trace id for each HTTP request I am making. I am spitting it as response header.

@dyladan
Copy link
Member

dyladan commented Nov 2, 2020

So, if you make multiple concurrent requests they all have the same root span?

If you wait for those requests to finish then make a new request, does the root span change to the new request?

@sureshballa
Copy link
Author

@dyladan yes, all spans of 2nd to 20th HTTP request doesn't have corresponding parent ids to their own trace id. Instead parent id is referred to trace id of first HTTP request.

If I do not make concurrent request and do separate HTTP request then all things are good. Correct trace id as parent id for span

@dyladan
Copy link
Member

dyladan commented Nov 2, 2020

hmm this sounds like a context leak to me. @vmarchaud do you have any ideas?

@vmarchaud
Copy link
Member

Yeah it sounds like a leak, could you share the list of dependencies that you use @sureshballa ? Also that would help you if you'd have a mininal code example that reproduce the issue.

@dyladan
Copy link
Member

dyladan commented Nov 4, 2020

Minimal reproduction would be a big help. Maybe also disable all plugins except http/https and see if the issue persists. Then enable 1 by 1 until you find the plugin which causes the issue.

@dyladan
Copy link
Member

dyladan commented Aug 3, 2022

No response in 2 years. I'm going to close this. Please reopen if you can provide more information.

@dyladan dyladan closed this as completed Aug 3, 2022
pichlermarc pushed a commit to dynatrace-oss-contrib/opentelemetry-js that referenced this issue Dec 15, 2023
* chore(deps): update dependency semver to v7.5.4

* chore(deps): align semver versions across repo

* chore(deps): pin semver version v7.5.4
@Nawazsherif
Copy link

Nawazsherif commented Oct 17, 2024

This issue still persists, @dyladan. The dependencies that I'm using are listed below:

"@opentelemetry/api": "^1.9.0",
"@opentelemetry/exporter-trace-otlp-http": "^0.53.0",
"@opentelemetry/instrumentation": "^0.53.0",
"@opentelemetry/instrumentation-http": "^0.53.0",
"@opentelemetry/instrumentation-winston": "^0.40.0",
"@opentelemetry/propagator-b3": "^1.26.0",
"@opentelemetry/sdk-trace-node": "^1.26.0"

The instrumentation code looks like this.

const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http');
const { B3Propagator } = require('@opentelemetry/propagator-b3');
const { WinstonInstrumentation } = require('@opentelemetry/instrumentation-winston');
const {
BatchSpanProcessor,
NodeTracerProvider,
} = require('@opentelemetry/sdk-trace-node');
const { registerInstrumentations } = require('@opentelemetry/instrumentation');

const provider = new NodeTracerProvider();
const exporter = new OTLPTraceExporter({
concurrencyLimit: 10,
});
provider.addSpanProcessor(new BatchSpanProcessor(exporter, {
maxQueueSize: 100,
maxExportBatchSize: 10,
scheduledDelayMillis: 500,
exportTimeoutMillis: 30000,
}));

provider.register({
propagator: new B3Propagator(),
});

registerInstrumentations({
instrumentations: [
new HttpInstrumentation(),
new WinstonInstrumentation({
disableLogSending: true,
}),
],
});

@Nawazsherif
Copy link

Can you reopen this issue and push a fix for it? When multiple api calls from UI reach our server at almost similar time, this issue is occurring. We are running a react application which make call to this node express server.

@dyladan dyladan reopened this Oct 17, 2024
@dyladan
Copy link
Member

dyladan commented Oct 17, 2024

We still need a reproducer for this

@dyladan dyladan added the needs:reproducer This bug/feature is in need of a minimal reproducer label Oct 17, 2024
@Nawazsherif
Copy link

Hi @dyladan, Thanks for the quick response. Actually, the issue is not with the trace provider library. It was with the incorrect configuration of logging library (logging-winston).

Old code:

registerInstrumentations({
instrumentations: [
new WinstonInstrumentation({
disableLogSending: true,
logHook: (span, record) => {
record['trace'] = projects/${process.env.GOOGLE_PROJECT_ID}/traces/${record.trace_id};
record['spanId'] = record.span_id;
record['traceSampled'] = true;
},
}),
],
});

Corrected code:

registerInstrumentations({
instrumentations: [
new WinstonInstrumentation({
disableLogSending: true,
logHook: (span, record) => {
record['logging.googleapis.com/trace'] = projects/${process.env.GOOGLE_PROJECT_ID}/traces/${record.trace_id};
record['logging.googleapis.com/spanId'] = record.span_id;
record['logging.googleapis.com/trace_sampled'] = true;
},
}),
],
});

@Nawazsherif
Copy link

Nawazsherif commented Oct 17, 2024

In the below log added, the trace id added by instrumentation in metadata is different from the actual trace id.

Old log entry:
jsonPayload: {
message: "Successfully fetched current status"
metadata: {
metadata: {
correlationId: "faeb8ac00156444582bdbaacf7b7f6d5"
spanId: "db4664488f2cde14"
span_id: "db4664488f2cde14"
timestamp: "2024-10-17T14:33:43.466Z"
traceSampled: true
trace_flags: "01"
trace_id: "4b1bba3e9dac6eb39b100c9b8be92f1a"
}
}
}
timestamp: "2024-10-17T14:33:43.733000040Z"
trace: "projects//traces/f2416c124adcec7791c04852a63d54fa"
traceSampled: true

@dyladan
Copy link
Member

dyladan commented Oct 17, 2024

Does that mean this is resolved?

@Nawazsherif
Copy link

Yes, this is resolved. Thanks a lot !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs:reproducer This bug/feature is in need of a minimal reproducer
Projects
None yet
Development

No branches or pull requests

4 participants