Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(observability): fix bugs found from product review + negative cases #2158

Conversation

odeke-em
Copy link
Contributor

This change adds recording of retry span annotations, catching cases in which exceptions where thrown but spans were not ended while testing out and visually confirming the results.

@odeke-em odeke-em requested review from a team as code owners October 12, 2024 09:25
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: spanner Issues related to the googleapis/nodejs-spanner API. labels Oct 12, 2024
@odeke-em odeke-em force-pushed the trace-fix-bugs-from-product-review+negative-cases branch from 62bf905 to 1b7d8b6 Compare October 12, 2024 09:26
@product-auto-label product-auto-label bot added size: l Pull request size is large. size: xl Pull request size is extra large. and removed size: m Pull request size is medium. size: l Pull request size is large. labels Oct 12, 2024
src/transaction.ts Outdated Show resolved Hide resolved
src/database.ts Outdated
if (err) {
setSpanError(span, err!);
}
await runner.run().then(release, async err => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are already doing .then() on runner.run(), why do we then need to await here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@odeke-em it will work even if we don't await on runner.run(). please check and remove the await

src/database.ts Outdated Show resolved Hide resolved
@@ -754,9 +755,11 @@ export class Snapshot extends EventEmitter {
(err as grpc.ServiceError).code === grpc.status.ABORTED
)
) {
span.addEvent('Transaction Aborted, retrying begin', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code block is for the non Aborted errors. Change Transaction threw error, retrying with explicit begin

src/database.ts Outdated
@@ -3222,14 +3223,14 @@ class Database extends common.GrpcServiceObject {
span.addEvent('No session available', {
'session.id': session?.id,
});
this.runTransaction(options, runFn!);
await this.runTransaction(options, runFn!);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have we added await on this.runTransaction. This does not return a promise nor its a async method.

src/database.ts Outdated
span.end();
return;
}

if (err) {
await runFn!(err as grpc.ServiceError);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here ?

src/database.ts Outdated

if (isSessionNotFoundError(err)) {
span.addEvent('No session available', {
'session.id': session?.id,
});
release();
this.runTransaction(options, runFn!);
await this.runTransaction(options, runFn!);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again why async here?

src/database.ts Outdated

try {
const result = await runner.run();
span.end();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

span.end() should be done in finally block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, on an exception this code retries in the while loop above so that's not correct to end it in the finally block. It should be ended only on success.

src/transaction.ts Show resolved Hide resolved
@@ -754,9 +764,16 @@ export class Snapshot extends EventEmitter {
(err as grpc.ServiceError).code === grpc.status.ABORTED
)
) {
this.begin();
span.addEvent('Stream broken. Safe to retry', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these two spans for stream broken. We are not retrying broken stream here , that is happening in partial-result-stream class. https://github.com/googleapis/nodejs-spanner/blob/main/src/partial-result-stream.ts#L558-L586
Maybe lets skip these two spans for now and move on.

src/transaction.ts Outdated Show resolved Hide resolved
@@ -1330,8 +1364,16 @@ export class Snapshot extends EventEmitter {
(err as grpc.ServiceError).code === grpc.status.ABORTED
)
) {
this.begin();
span.addEvent('Stream broken. Safe to retry', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. remove these events

callback(err, resp);
});
}, callback);
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no change is required here. As we are doing ".then" on begin call.

@odeke-em odeke-em force-pushed the trace-fix-bugs-from-product-review+negative-cases branch from 91c5281 to 508243f Compare October 18, 2024 06:06
@alkatrivedi alkatrivedi added kokoro:force-run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels Oct 18, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 18, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 18, 2024
@alkatrivedi alkatrivedi added kokoro:force-run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels Oct 18, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 18, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 18, 2024
@odeke-em odeke-em force-pushed the trace-fix-bugs-from-product-review+negative-cases branch from 7bdc173 to a3589e5 Compare October 20, 2024 03:28
odeke-em added a commit to odeke-em/nodejs-spanner that referenced this pull request Oct 21, 2024
Extracted out of PR googleapis#2158, this change traces
Database.runTransactionAsync. However, testing isn't effective
because of bugs such as googleapis#2166.
odeke-em added a commit to odeke-em/nodejs-spanner that referenced this pull request Oct 21, 2024
Extracted out of PR googleapis#2158, this change traces
Database.runTransactionAsync. However, testing isn't effective
because of bugs such as googleapis#2166.
odeke-em added a commit to odeke-em/nodejs-spanner that referenced this pull request Oct 21, 2024
Extracted out of PR googleapis#2158, this change traces
Database.runTransactionAsync.

Updates googleapis#2079
@odeke-em odeke-em force-pushed the trace-fix-bugs-from-product-review+negative-cases branch from e0e31f5 to 13bc9f5 Compare October 21, 2024 10:52
gcf-merge-on-green bot pushed a commit that referenced this pull request Oct 21, 2024
Extracted out of PR #2158, this change traces
Database.runTransactionAsync.

Updates #207
@odeke-em odeke-em force-pushed the trace-fix-bugs-from-product-review+negative-cases branch from 3fb5164 to 03a6298 Compare October 21, 2024 13:15
@alkatrivedi alkatrivedi added kokoro:force-run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels Oct 24, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 24, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 24, 2024
@odeke-em odeke-em force-pushed the trace-fix-bugs-from-product-review+negative-cases branch from ab6ae5d to 9b0a458 Compare October 24, 2024 06:38
src/database.ts Outdated
Comment on lines 3272 to 3275
.catch(e => {
setSpanErrorAndException(span, e as Error);
throw e;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.catch(e => {
setSpanErrorAndException(span, e as Error);
throw e;
});
.catch(e => {
setSpanErrorAndException(span, e as Error);
span.end();
throw e;
});

@surbhigarg92 surbhigarg92 added the kokoro:run Add this label to force Kokoro to re-run the tests. label Oct 24, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Oct 24, 2024
@surbhigarg92 surbhigarg92 added the kokoro:run Add this label to force Kokoro to re-run the tests. label Oct 24, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Oct 24, 2024
@odeke-em odeke-em force-pushed the trace-fix-bugs-from-product-review+negative-cases branch from 5bf4980 to f0b8745 Compare October 24, 2024 07:50
@surbhigarg92 surbhigarg92 added kokoro:force-run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels Oct 24, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 24, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 24, 2024
@surbhigarg92 surbhigarg92 merged commit cbc86fa into googleapis:main Oct 24, 2024
16 checks passed
@odeke-em odeke-em deleted the trace-fix-bugs-from-product-review+negative-cases branch October 24, 2024 18:42
surbhigarg92 pushed a commit to surbhigarg92/nodejs-spanner that referenced this pull request Oct 25, 2024
Extracted out of PR googleapis#2158, this change traces
Database.runTransactionAsync.

Updates googleapis#2079
gcf-merge-on-green bot pushed a commit that referenced this pull request Oct 30, 2024
🤖 I have created a release *beep* *boop*
---


## [7.15.0](https://togithub.com/googleapis/nodejs-spanner/compare/v7.14.0...v7.15.0) (2024-10-30)


### Features

* (observability, samples): add tracing end-to-end sample ([#2130](https://togithub.com/googleapis/nodejs-spanner/issues/2130)) ([66d99e8](https://togithub.com/googleapis/nodejs-spanner/commit/66d99e836cd2bfbb3b0f78980ec2b499f9e5e563))
* (observability) add spans for BatchTransaction and Table ([#2115](https://togithub.com/googleapis/nodejs-spanner/issues/2115)) ([d51aae9](https://togithub.com/googleapis/nodejs-spanner/commit/d51aae9c9c3c0e6319d81c2809573ae54675acf3)), closes [#2114](https://togithub.com/googleapis/nodejs-spanner/issues/2114)
* (observability) Add support for OpenTelemetry traces and allow observability options to be passed.  ([#2131](https://togithub.com/googleapis/nodejs-spanner/issues/2131)) ([5237e11](https://togithub.com/googleapis/nodejs-spanner/commit/5237e118befb4b7fe4aea76a80a91e822d7a22e4)), closes [#2079](https://togithub.com/googleapis/nodejs-spanner/issues/2079)
* (observability) propagate database name for every span generated to aid in quick debugging ([#2155](https://togithub.com/googleapis/nodejs-spanner/issues/2155)) ([0342e74](https://togithub.com/googleapis/nodejs-spanner/commit/0342e74721a0684d8195a6299c3a634eefc2b522))
* (observability) trace Database.batchCreateSessions + SessionPool.createSessions ([#2145](https://togithub.com/googleapis/nodejs-spanner/issues/2145)) ([f489c94](https://togithub.com/googleapis/nodejs-spanner/commit/f489c9479fa5402f0c960cf896fd3be0e946f182))
* (observability): trace Database.runPartitionedUpdate ([#2176](https://togithub.com/googleapis/nodejs-spanner/issues/2176)) ([701e226](https://togithub.com/googleapis/nodejs-spanner/commit/701e22660d5ac9f0b3e940ad656b9ca6c479251d)), closes [#2079](https://togithub.com/googleapis/nodejs-spanner/issues/2079)
* (observability): trace Database.runTransactionAsync ([#2167](https://togithub.com/googleapis/nodejs-spanner/issues/2167)) ([d0fe178](https://togithub.com/googleapis/nodejs-spanner/commit/d0fe178623c1c48245d11bcea97fcd340b6615af)), closes [#207](https://togithub.com/googleapis/nodejs-spanner/issues/207)
* Allow multiple KMS keys to create CMEK database/backup ([#2099](https://togithub.com/googleapis/nodejs-spanner/issues/2099)) ([51bc8a7](https://togithub.com/googleapis/nodejs-spanner/commit/51bc8a7445ab8b3d2239493b69d9c271c1086dde))
* **observability:** Fix bugs found from product review + negative cases ([#2158](https://togithub.com/googleapis/nodejs-spanner/issues/2158)) ([cbc86fa](https://togithub.com/googleapis/nodejs-spanner/commit/cbc86fa80498af6bd745eebb9443612936e26d4e))
* **observability:** Trace Database methods ([#2119](https://togithub.com/googleapis/nodejs-spanner/issues/2119)) ([1f06871](https://togithub.com/googleapis/nodejs-spanner/commit/1f06871f7aca386756e8691013602b069697bb87)), closes [#2114](https://togithub.com/googleapis/nodejs-spanner/issues/2114)
* **observability:** Trace Database.batchWriteAtLeastOnce ([#2157](https://togithub.com/googleapis/nodejs-spanner/issues/2157)) ([2a19ef1](https://togithub.com/googleapis/nodejs-spanner/commit/2a19ef1af4f6fd1b81d08afc15db76007859a0b9)), closes [#2079](https://togithub.com/googleapis/nodejs-spanner/issues/2079)
* **observability:** Trace Transaction ([#2122](https://togithub.com/googleapis/nodejs-spanner/issues/2122)) ([a464bdb](https://togithub.com/googleapis/nodejs-spanner/commit/a464bdb5cbb7856b7a08dac3ff48132948b65792)), closes [#2114](https://togithub.com/googleapis/nodejs-spanner/issues/2114)


### Bug Fixes

* Exact staleness timebound ([#2143](https://togithub.com/googleapis/nodejs-spanner/issues/2143)) ([f01516e](https://togithub.com/googleapis/nodejs-spanner/commit/f01516ec6ba44730622cfb050c52cd93f30bba7a)), closes [#2129](https://togithub.com/googleapis/nodejs-spanner/issues/2129)
* GetMetadata for Session ([#2124](https://togithub.com/googleapis/nodejs-spanner/issues/2124)) ([2fd63ac](https://togithub.com/googleapis/nodejs-spanner/commit/2fd63acb87ce06a02d7fdfa78d836dbd7ad59a26)), closes [#2123](https://togithub.com/googleapis/nodejs-spanner/issues/2123)

---
This PR was generated with [Release Please](https://togithub.com/googleapis/release-please). See [documentation](https://togithub.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/nodejs-spanner API. size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants