Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error 4: Deadline for Transaction exceeded / Transaction outcome unknown #216

Closed
walshie4 opened this issue Jun 10, 2018 · 16 comments
Closed
Assignees
Labels
api: spanner Issues related to the googleapis/nodejs-spanner API. needs more info This issue needs more information from the customer to proceed. 🚨 This issue needs some love. triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@walshie4
Copy link

Environment details

  • OS: CentOS
  • Node.js version: v8.9.4
  • npm version: v5.10.0
  • @google-cloud/spanner version: v1.4.3

Steps to reproduce

  1. ???
  2. ???

Attempting an insert in the same way as described here: #202 (comment)

Code causing the issue for us:

// spanner is a Spanner database object ready to be used for operations
// tableName is a String of the desired table's name
// rowData is an object of data to write
export default (spanner, tableName, rowData) => {
  return new Promise((resolve, reject) => {
    // eslint-disable-next-line consistent-return
    spanner.runTransaction((errTrx, dbTrx) => {
      if (errTrx) {
        honeyLogger.error('V3 encountered error inserting', errTrx);
        return reject(errTrx);
      }

      const addedInfo = { storeShardId: getSpannerShardId(rowData.storeId) };
      const insertColumns = Object.assign({}, addedInfo, rowData);

      dbTrx.insert(tableName, insertColumns);
      dbTrx.commit((err) => {
        if (err) {
          dbTrx.end();
          return reject(err);
        }
        return resolve();
      });
    });
  })
  .then(() => rowData);
};

I've added better logging to see if I can get a stack but so far this is all I have on the error.

code: 4    
   details: "Transaction outcome unknown."    
   message: "Deadline for Transaction exceeded."    
   metadata: {
    _internal_repr: {
    }
   }
   note: "Exception occurred in retry method that was not classified as transient"    
@JustinBeckwith JustinBeckwith added triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Jun 10, 2018
@JustinBeckwith JustinBeckwith removed the triage me I really want to be triaged. label Jun 11, 2018
@crwilcox crwilcox changed the title Error 4: Transaction outcome unknown Error 4: Deadline for Transaction exceeded / Transaction outcome unknown Jun 11, 2018
@crwilcox
Copy link
Contributor

How large is this transaction? Is this a single row update?

@walshie4
Copy link
Author

Yes it is a single row.

@crwilcox
Copy link
Contributor

Alright. I can't imagine a single row being so big that it would exceed deadlines. Just to check assumptions I may have made, the intent of your code is the following:

  • Take a row { storeId: 'storeidval', field1: 'field1val', ..., fieldn: 'fieldnval' }
  • commit with additional column: { storeId: 'storeidval', field1: 'field1val', ..., fieldn: 'fieldnval', storeShardId: 'storeidval'}

Could you measure how long getSpannerShardId takes? That appears to be the only things in this method I don't fully understand the runtime of. If that takes a while it could be causing this update to take an unusually long time.

@walshie4
Copy link
Author

getSpannerShardId is a synchronous function. Your understanding is of the flow is correct. The failing write is for a small row on a table interleaved 2 levels down.

top table
|-- secondary table
|--|-- table receiving the failing write

When the original fix for this issue was released this issue morphed from what you see described in #202 to the current issue. It now appears I get this error as the error param of my spanner.runTransaction call.

Per some conversation I saw around the past ticket what I suspect is happening is the same as the previous issue, however now that it is retried internally this error occurs when retrying takes too long (I think I remember someone mentioned y'all will throw Deadline exceeded in those cases).

@crwilcox
Copy link
Contributor

@walshie4 , your understanding about Deadline exceeded is correct. If we retry a satisfactory number of times we will stop and throw. This error to me indicates that something in the original request is unable to be satisfied. The error we handle is a 500 (Internal Server Error) which likely means the backend doesn't understand the request. I still haven't been able to reproduce this on my end so there is something about the payload that is more unique.

@walshie4
Copy link
Author

I'll add some more logging when I get a chance to see if I can gather some more detailed info on the insert in question causing the issue.

@JustinBeckwith JustinBeckwith added needs more info This issue needs more information from the customer to proceed. 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Jun 12, 2018
@JustinBeckwith JustinBeckwith added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Jun 22, 2018
@crwilcox
Copy link
Contributor

@walshie4 , any luck at finding anything?

@walshie4
Copy link
Author

Nothing of value, I have examples of the rows attempting to be inserted but there is nothing special about them. Here's an example (field names have been replaced with types)

{
    int64NotNull: 0     
    int64NotNull: 0     
    string(max): "stuff"     
    string(max): ""     
    int64: null     
    int64NotNull: 1530034497371     
    int64NotNull: 1530034497372     
    int64: "1530293697456"     
    string(64): null     
    string(64)NotNull: "12334"     
    int64NotNull: 0     
    nullableArrayOfStrings(64): null     
   }

The only weirdness I see is the int64 values are sometimes strings, sometimes JS ints. Not sure how this would cause an issue like we've been seeing though.

@eric4545
Copy link
Contributor

Hi there, we also encounter the same error randomly. We got message Deadline for Transaction exceeded. and Aborted due to transient fault. But according to our APM our request did not take longer than 10s, so didn't look like it exceed the deadline.

Our current logging didn;t log it in detail, we will add better logging for this error.

@JustinBeckwith JustinBeckwith added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Jul 2, 2018
@Duncan00
Copy link

Is this issue also related the retry mechanism?
https://issuetracker.google.com/issues/111151134
https://cloud.google.com/spanner/docs/transactions#rw_transaction_interface

We got the error quite frequently. It is impossible to use transaction feature with this error.

@JustinBeckwith JustinBeckwith added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Jul 17, 2018
@eric4545
Copy link
Contributor

eric4545 commented Jul 26, 2018

Trying reproduce the error in high throughput, spanner server seems didnt response any RETRY_INFO_KEY 1 and the client always return DeadlineError when cannot retry 2.

image
First one is when Deadline for Transaction exceeded.
Last one in normal retry.

@crwilcox
Copy link
Contributor

@eric4545 , the logic for retrying a transaction requires the following be true

  1. the error code was either a ABORTED or UNKNOWN
  2. the returned error contained a retry info key.
  3. we have not exceeded the timeout allowed

If any of those aren't true, the transaction will end. In the cases where it was a retryable error code but it exceed the allowable time, the returned error is modified to be a DEADLINE_EXCEEDED error.

@crwilcox
Copy link
Contributor

@Duncan00 the retry mechanism only retries on ABORTED and UNKNOWN because we consider those retry-able error codes. Some of the error codes likely wouldn't benefit from being retried.

If you wanted to retry more aggressively you could always modify the transaction's isRetryableErrorCode_(errCode) to include more error conditions on which retry is allowed.

@hans-lizihan
Copy link

hans-lizihan commented Sep 14, 2018

@crwilcox the following code could easily reproduce the deadline transaction exceeded error, please help to check (it may need to take a long time to encounter the error, just keep it running for 1 hour):

note that even the simplest transaction.run without querying any table occasionally encounter this.

That means doing any transaction.run inside the transaction context is potentially affected. (we could record an error rate ~ 0.2% - 0.01%)

'use strict';

const {Spanner} = require('@google-cloud/spanner');
const readline = require('readline');

const spanner = new Spanner({
	projectId: '<your project>'
});

const database = spanner
	.instance('<your instance>')
	.database('<your db>', {
		keepAlive: 5,
	});

let total = 0;
let errors = [];

async function main() {
	while (true) {
		try {
			await test();
		} catch (e) {
			e.time = new Date();
			console.error(e)
			errors.push(e);
		} finally {
			total += 1;
			readline.clearLine(process.stdout, 0)
			readline.cursorTo(process.stdout, 0, null);
			process.stdout.write(`Total: ${total}, Errors: ${errors.length}, Error rate: ${((errors.length / total) * 100).toFixed(6) + '%'}`)
		}
	}
}

main().catch(console.error);

async function test() {
	return new Promise((res, rej) => {
		database.runTransaction(async (err, transaction) => {
			if (err) return rej(err);
			try {
				await transaction.run({sql: 'select 1 as ok'})
				await transaction.commit();
				return res(true)
			} catch (e) {
				return rej(e);
			}
		});
	})
}

@eric4545
Copy link
Contributor

@crwilcox When this error throw there is no retry info key in the metadata, i tried the same flow using golang, so far no error occurs.

https://gist.github.com/eric4545/c514128a625172cf5c6969ae74cd3f83

@hans-lizihan
Copy link

hans-lizihan commented Sep 16, 2018

same script won't trigger any errors in ruby sdk as well. (correct me if me usage was wrong)

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new project_id: 'aftership-test'

db = spanner.client "aftership-test-1", "test_core_aftershipapi_com"

total = 0
errors = 0

while true do
  begin
    db.transaction do |tx|
      results = tx.execute "SELECT 1 as OK"

      total += results.rows.count
    end
    print "Total query: #{total},  errors: #{errors}\r"
    $stdout.flush
  rescue Interrupt => e
    print_exception(e, true)
  rescue Exception => e
    errors += 1
    print_exception(e, false)
  end
end

@ghost ghost removed the priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. label Oct 1, 2018
@google-cloud-label-sync google-cloud-label-sync bot added the api: spanner Issues related to the googleapis/nodejs-spanner API. label Jan 31, 2020
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Apr 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/nodejs-spanner API. needs more info This issue needs more information from the customer to proceed. 🚨 This issue needs some love. triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

7 participants