Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup error #133527

Open
navidhaghighi opened this issue Oct 27, 2024 · 5 comments
Open

Backup error #133527

navidhaghighi opened this issue Oct 27, 2024 · 5 comments
Assignees
Labels
A-disaster-recovery A-jobs C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-jobs X-blathers-triaged blathers was able to find an owner

Comments

@navidhaghighi
Copy link

navidhaghighi commented Oct 27, 2024

Hello , we have an issue in our production cockroachDB.
When i try to backup a table from our Database i run into this issue:
Node liveness error
Node liveness
This is the command that i use:
BACKUP public.message INTO 'nodelocal://1/1' AS OF SYSTEM TIME '-10s';

BTW: our daily backups fail as well , this is the Command we use to schedule our daily backups:

CREATE SCHEDULE bkp_schedule FOR BACKUP INTO 'nodelocal://1/dailybackups' RECURRING '0 7 * * *'
FULL BACKUP ALWAYS
WITH SCHEDULE OPTIONS first_run = 'now';

Our setup:
CPU

Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz

Base speed:	2.30 GHz
Sockets:	18
Virtual processors:	36
Virtual machine:	Yes
L1 cache:	N/A

Utilization	21%
Speed	2.30 GHz
Up time	5:05:47:36
Processes	138
Threads	13931
Handles	135384

Memory : 160 GB
Windows :10 - 64 bit architecture
CockroachDB Version : 23.1.8
Running Command:
%~dp0cockroach.exe start-single-node --insecure --cache=.25 --max-sql-memory=.25 && (echo 'didn't fail') || (start cockroach.bat)


Also we are using CockroachDB for the purpose of Nakama chat and messaging System.
Our backup has been recently running into this issue.
I can't tell you how to reproduce this issue specifically because we have around 1 million+ users and millions of rows in the messages table of our database , but i can provide logs if necessary.
Thank you in advance

Jira issue: CRDB-43662

@navidhaghighi navidhaghighi added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Oct 27, 2024
Copy link

blathers-crl bot commented Oct 27, 2024

Hi @navidhaghighi, please add branch-* labels to identify which branch(es) this C-bug affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link

blathers-crl bot commented Oct 27, 2024

Hello, I am Blathers. I am here to help you get the issue triaged.

It looks like you have not filled out the issue in the format of any of our templates. To best assist you, we advise you to use one of these templates.

I have CC'd a few people who may be able to assist you:

  • @cockroachdb/disaster-recovery (found keywords: backup, found keywords: CREATE SCHEDULE)
  • @cockroachdb/kv (found keywords: liveness)

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added A-disaster-recovery A-jobs O-community Originated from the community X-blathers-triaged blathers was able to find an owner T-disaster-recovery T-jobs labels Oct 27, 2024
Copy link

blathers-crl bot commented Oct 27, 2024

cc @cockroachdb/disaster-recovery

@alicia-l2
Copy link

Hello, do you mind sharing your cluster config? You may be running into a node liveness issue due to insufficient CPU for running a backup job, or other reasons -- see here for more information on how to diagnose the node liveness issue.

@navidhaghighi
Copy link
Author

@alicia-l2
I have attached my cluster settings , i got it by using this command : SHOW CLUSTER SETTINGS ALL;
if this is not the file you asked for , please let me know.
after a shutdown and an immediate backup the issue was fixed but it started happening again.
Cluster setting.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery A-jobs C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-jobs X-blathers-triaged blathers was able to find an owner
Projects
None yet
Development

No branches or pull requests

2 participants