Open
Description
Describe the bug
Deployed using https://github.com/rudderlabs/rudderstack-helm
rudderstack backend keeps crashing and restarting with the error

Defaulted container "rudderstack-backend" out of: rudderstack-backend, rudderstack-telegraf-sidecar
Executing docker entrypoint script
Computed db host to namaste-rudderstack-rudderstack-postgresql-0.namaste-rudderstack-rudderstack-postgresql-headless
2024/04/18 06:57:30 maxprocs: Updating GOMAXPROCS=1: determined from CPU quota
INFO: No .env file found.
2024-04-18T06:57:30.671Z INFO rudder-server/main.go:27 Setting memory limit to {"limit": 1073741824}
2024-04-18T06:57:30.673Z INFO stats stats/statsd.go:44 StatsD client setup succeeded.
2024-04-18T06:57:30.673Z INFO stats stats/statsd.go:89 Stats started successfully in mode "StatsD" with address "localhost:8125"
2024-04-18T06:57:30.675Z INFO app app/app.go:55 Open source version of rudder-server
2024-04-18T06:57:30.681Z INFO backend-config backend-config/single_workspace.go:60 Setup backend config complete {"workspaceId": ""}
2024-04-18T06:57:33.131Z INFO runner runner/runner.go:344 warehousemode embedded
2024-04-18T06:57:33.133Z INFO db.recovery db/embeddedRecovery.go:42 Starting in normal mode
2024-04-18T06:57:33.479Z INFO backend-config backend-config/backend-config.go:215 Workspace Config changed{"workspaces": 1, "sources": 6}
2024-04-18T06:57:36.430Z INFO runner.warehouse.notifier notifier/notifier.go:165 Initializing Notifier...
2024-04-18T06:57:37.141Z INFO runner runner/runner.go:344 warehousemode embedded
2024-04-18T06:57:37.142Z INFO admin admin/admin.go:167 Serving on admin interface @ /data/rudderstack/rudder-server.sock
2024-04-18T06:57:37.142Z INFO apphandlers.EMBEDDED apphandlers/embeddedAppHandler.go:87 Embedded mode: Starting Rudder Core
2024-04-18T06:57:37.143Z INFO apphandlers.EMBEDDED apphandlers/embeddedAppHandler.go:95 Configured deployment type: "DEDICATED"
2024-04-18T06:57:37.143Z INFO apphandlers.EMBEDDED apphandlers/embeddedAppHandler.go:105 Clearing DB false
2024-04-18T06:57:37.143Z INFO debugger.transformation cache/cache.go:29 Using in-memory cache
2024-04-18T06:57:37.144Z INFO debugger.destination cache/cache.go:29 Using in-memory cache
2024-04-18T06:57:37.145Z INFO debugger.source cache/cache.go:29 Using in-memory cache
2024-04-18T06:57:37.146Z INFO runner.warehouse warehouse/app.go:310 Starting Warehouse service...
2024-04-18T06:57:37.146Z INFO runner.warehouse warehouse/app.go:375 Starting warehouse slave...
2024-04-18T06:57:37.146Z INFO runner.warehouse warehouse/app.go:391 [WH]: Starting warehouse master...
2024-04-18T06:57:37.147Z INFO runner.warehouse.api api/http.go:157 waiting for BackendConfig before starting on 8082
2024-04-18T06:57:37.147Z INFO runner.warehouse.api api/http.go:142 Starting warehouse master service on8082
2024-04-18T06:57:37.148Z INFO runner.warehouse.notifier notifier/notifier.go:312 Deleting all jobs for workspace: default::f597437212cb10a56115df53cd0f6881
2024-04-18T06:57:37.150Z INFO runner.warehouse warehouse/app.go:475 Starting a new Warehouse Destination Router: RS
2024-04-18T06:57:37.150Z INFO runner.warehouse.router.RS router/router.go:147 WH: Warehouse Router started: RS
2024-04-18T06:57:38.035Z INFO runner.warehouse.source source/source.go:143 starting source jobs processing
2024-04-18T06:57:38.240Z INFO runner.warehouse.router.RS router/router.go:254 Received updated workspace config
2024-04-18T06:57:38.635Z INFO transformer-features transformer/features_impl.go:62 Fetching transformer features from http://namaste-rudderstack-transformer:9090
2024-04-18T06:57:39.030Z INFO jobsdb.gw jobsdb/jobsdb.go:840 Connected to gw DB
2024-04-18T06:57:50.231Z INFO jobsdb.gw jobsdb/jobsdb.go:840 Connected to gw DB
2024-04-18T06:57:56.737Z INFO jobsdb.rt jobsdb/jobsdb.go:840 Connected to rt DB
2024-04-18T06:57:59.233Z INFO jobsdb.batch_rt jobsdb/jobsdb.go:840 Connected to batch_rt DB
2024-04-18T06:58:54.035Z INFO jobsdb.proc_error jobsdb/jobsdb.go:840 Connected to proc_error DB
2024-04-18T06:58:55.035Z INFO jobsdb.proc_error jobsdb/jobsdb.go:840 Connected to proc_error DB
2024-04-18T06:58:57.340Z INFO jobsdb.esch jobsdb/jobsdb.go:840 Connected to esch DB
2024-04-18T06:58:58.930Z INFO jobsdb.arc jobsdb/jobsdb.go:840 Connected to arc DB
2024-04-18T06:59:00.036Z INFO apphandlers.EMBEDDED apphandlers/setup.go:114 using Static Cluster Manager
2024-04-18T06:59:00.237Z INFO enterprise.suppress-user suppress-user/factory.go:48 Suppress User feature is enterprise only
2024-04-18T06:59:00.238Z WARN rsources rsources/handler.go:770 shared database is not configured, skipping logical replication monitoring
2024-04-18T06:59:00.239Z INFO gateway gateway/handle_lifecycle.go:363 WebHandler waiting for BackendConfig before starting on 8080
2024-04-18T06:59:00.239Z INFO cluster cluster/dynamic.go:100 Got trigger to change the mode, new mode: NORMAL, old mode: DEGRADED
2024-04-18T06:59:00.239Z INFO cluster cluster/dynamic.go:218 Transiting the server from DegradedMode to NormalMode
2024-04-18T06:59:00.239Z INFO cluster cluster/dynamic.go:123 Starting the server
2024-04-18T06:59:00.242Z INFO gateway gateway/handle_lifecycle.go:195 BackendConfig initialised {"sources": 6}
2024-04-18T06:59:00.242Z INFO gateway gateway/handle_lifecycle.go:365 WebHandler Starting on 8080
2024-04-18T06:59:00.530Z INFO jobsdb.proc_error jobsdb/backup.go:36 backupDSLoop is running
2024-04-18T06:59:01.737Z INFO jobsdb.gw jobsdb/backup.go:36 backupDSLoop is running
2024-04-18T06:59:02.237Z INFO jobsdb.esch jobsdb/backup.go:36 backupDSLoop is running
2024-04-18T06:59:03.536Z INFO jobsdb.arc jobsdb/backup.go:36 backupDSLoop is running
2024-04-18T06:59:04.738Z INFO jobsdb.rt jobsdb/backup.go:36 backupDSLoop is running
2024-04-18T06:59:13.138Z INFO jobsdb.batch_rt jobsdb/backup.go:36 backupDSLoop is running
2024-04-18T07:08:49.633Z INFO archiver archiver/archiver.go:99 Starting archiver
2024-04-18T07:08:49.635Z INFO processor processor/processor.go:628 Starting processor in isolation mode: source
2024-04-18T07:08:49.637Z INFO processor processor/processor.go:659 Starting pinger loop
2024-04-18T07:08:49.637Z INFO processor processor/processor.go:661 Backend config received
2024-04-18T07:08:49.637Z INFO processor processor/processor.go:664 Waiting for async init group
2024-04-18T07:08:49.637Z INFO processor processor/processor.go:671 Async init group done
2024-04-18T07:08:49.637Z INFO processor processor/processor.go:674 Waiting for transformer features
2024-04-18T07:08:49.637Z INFO processor processor/processor.go:681 Transformer features received
2024-04-18T07:08:49.830Z INFO processor.stash stash/stash.go:296 Processor errors stash loop started
2024-04-18T07:09:43.545Z ERROR utils.misc misc/misc.go:926 Panic detected. Application will crash.{"stack": "goroutine 5537 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x5e\ngithub.com/rudderlabs/rudder-server/utils/misc.BugsnagNotify.func1.1()\n\t/rudder-server/utils/misc/misc.go:927 +0x37f\nsync.(*Once).doSlow(0x4778d2?, 0xc002607d78?)\n\t/usr/local/go/src/sync/once.go:74 +0xc2\nsync.(*Once).Do(...)\n\t/usr/local/go/src/sync/once.go:65\ngithub.com/rudderlabs/rudder-server/utils/misc.BugsnagNotify.func1()\n\t/rudder-server/utils/misc/misc.go:916 +0xbb\npanic({0x34155c0?, 0xc006a39540?})\n\t/usr/local/go/src/runtime/panic.go:770 +0x132\ngithub.com/rudderlabs/rudder-server/jobsdb.(*Handle).migrateDSLoop(0xc000cdd188, {0x4377970, 0xc0025f6ff0})\n\t/rudder-server/jobsdb/migration.go:54 +0x1ff\ngithub.com/rudderlabs/rudder-server/jobsdb.(*Handle).startMigrateDSLoop.func1()\n\t/rudder-server/jobsdb/migration.go:28 +0x1f\ngithub.com/rudderlabs/rudder-server/jobsdb.(*Handle).startMigrateDSLoop.WithBugsnag.func2()\n\t/rudder-server/utils/misc/misc.go:945 +0x5c\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\t/go/pkg/mod/golang.org/x/sync@v0.6.0/errgroup/errgroup.go:78 +0x56\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 2162\n\t/go/pkg/mod/golang.org/x/sync@v0.6.0/errgroup/errgroup.go:75 +0x96\n", "panic": "failed to migrate ds: pq: canceling statement due to user request; driver: bad connection", "team": "Core", "goRoutines": 685, "version": "1.23.0", "releaseStage": "development"}
2024-04-18T07:09:43.545Z ERROR utils.misc misc/misc.go:926 goroutine 5537 [running]:
github.com/rudderlabs/rudder-go-kit/logger.(*logger).Fatalw(0xc00128f9e0, {0x3a85163?, 0x4c6?}, {0xc0011ae600?, 0x1?, 0x34155c0?})
/go/pkg/mod/github.com/rudderlabs/rudder-go-kit@v0.23.3/logger/logger.go:332 +0xa5
github.com/rudderlabs/rudder-server/utils/misc.BugsnagNotify.func1.1()
/rudder-server/utils/misc/misc.go:926 +0x5b3
sync.(*Once).doSlow(0x4778d2?, 0xc002607d78?)
/usr/local/go/src/sync/once.go:74 +0xc2
sync.(*Once).Do(...)
/usr/local/go/src/sync/once.go:65
github.com/rudderlabs/rudder-server/utils/misc.BugsnagNotify.func1()
/rudder-server/utils/misc/misc.go:916 +0xbb
panic({0x34155c0?, 0xc006a39540?})
/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/rudderlabs/rudder-server/jobsdb.(*Handle).migrateDSLoop(0xc000cdd188, {0x4377970, 0xc0025f6ff0})
/rudder-server/jobsdb/migration.go:54 +0x1ff
github.com/rudderlabs/rudder-server/jobsdb.(*Handle).startMigrateDSLoop.func1()
/rudder-server/jobsdb/migration.go:28 +0x1f
github.com/rudderlabs/rudder-server/jobsdb.(*Handle).startMigrateDSLoop.WithBugsnag.func2()
/rudder-server/utils/misc/misc.go:945 +0x5c
golang.org/x/sync/errgroup.(*Group).Go.func1()
/go/pkg/mod/golang.org/x/sync@v0.6.0/errgroup/errgroup.go:78 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 2162
/go/pkg/mod/golang.org/x/sync@v0.6.0/errgroup/errgroup.go:75 +0x96
2024/04/18 07:09:43 notifying bugsnag: failed to migrate ds: pq: canceling statement due to user request; driver: bad connection
panic: failed to migrate ds: pq: canceling statement due to user request; driver: bad connection [recovered]
panic: failed to migrate ds: pq: canceling statement due to user request; driver: bad connection [recovered]
panic: failed to migrate ds: pq: canceling statement due to user request; driver: bad connection
goroutine 5537 [running]:
github.com/bugsnag/bugsnag-go/v2.AutoNotify({0xc00740dac0, 0x3, 0xc001969500?})
/go/pkg/mod/github.com/bugsnag/bugsnag-go/v2@v2.3.1/bugsnag.go:114 +0x3ef
panic({0x34155c0?, 0xc006a39540?})
/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/rudderlabs/rudder-server/utils/misc.BugsnagNotify.func1.1()
/rudder-server/utils/misc/misc.go:935 +0x5d4
sync.(*Once).doSlow(0x4778d2?, 0xc002607d78?)
/usr/local/go/src/sync/once.go:74 +0xc2
sync.(*Once).Do(...)
/usr/local/go/src/sync/once.go:65
github.com/rudderlabs/rudder-server/utils/misc.BugsnagNotify.func1()
/rudder-server/utils/misc/misc.go:916 +0xbb
panic({0x34155c0?, 0xc006a39540?})
/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/rudderlabs/rudder-server/jobsdb.(*Handle).migrateDSLoop(0xc000cdd188, {0x4377970, 0xc0025f6ff0})
/rudder-server/jobsdb/migration.go:54 +0x1ff
github.com/rudderlabs/rudder-server/jobsdb.(*Handle).startMigrateDSLoop.func1()
/rudder-server/jobsdb/migration.go:28 +0x1f
github.com/rudderlabs/rudder-server/jobsdb.(*Handle).startMigrateDSLoop.WithBugsnag.func2()
/rudder-server/utils/misc/misc.go:945 +0x5c
golang.org/x/sync/errgroup.(*Group).Go.func1()
/go/pkg/mod/golang.org/x/sync@v0.6.0/errgroup/errgroup.go:78 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 2162
/go/pkg/mod/golang.org/x/sync@v0.6.0/errgroup/errgroup.go:75 +0x96
While at the same time postgress machine logs the following error
2024-04-18 07:57:09.839 UTC [16135] STATEMENT: DELETE FROM "batch_rt_job_status_435_2"
WHERE NOT id = ANY(
SELECT DISTINCT ON (job_id) id from "batch_rt_job_status_435_2" ORDER BY job_id ASC, id DESC
)
2024-04-18 07:57:10.730 UTC [16111] LOG: unexpected EOF on client connection with an open transaction
2024-04-18 07:57:11.328 UTC [18493] LOG: could not receive data from client: Connection reset by peer
2024-04-18 07:57:11.347 UTC [18494] LOG: could not receive data from client: Connection reset by peer
2024-04-18 07:57:11.429 UTC [18490] LOG: could not receive data from client: Connection reset by peer
2024-04-18 07:57:11.441 UTC [18488] LOG: could not send data to client: Broken pipe
2024-04-18 07:57:11.453 UTC [18488] FATAL: connection to client lost
2024-04-18 07:57:11.538 UTC [18491] LOG: could not receive data from client: Connection reset by peer
2024-04-18 07:57:11.543 UTC [18435] LOG: could not send data to client: Connection reset by peer
2024-04-18 07:57:11.543 UTC [18435] STATEMENT: SELECT * FROM (SELECT
jobs.job_id, jobs.uuid, jobs.user_id, jobs.parameters, jobs.custom_val, jobs.event_payload, jobs.event_count,
jobs.created_at, jobs.expire_at, jobs.workspace_id,
pg_column_size(jobs.event_payload) as payload_size,
sum(jobs.event_count) over (order by jobs.job_id asc) as running_event_counts,
sum(pg_column_size(jobs.event_payload)) over (order by jobs.job_id) as running_payload_size,
job_latest_state.job_state, job_latest_state.attempt,
job_latest_state.exec_time, job_latest_state.retry_time,
job_latest_state.error_code, job_latest_state.error_response, job_latest_state.parameters
FROM
"gw_jobs_545" AS jobs
LEFT JOIN "gw_job_status_545" job_latest_state ON jobs.job_id=job_latest_state.job_id
WHERE ((job_latest_state.job_id IS NULL)) AND ((jobs.custom_val='GW')) AND (jobs.parameters->>'source_id'='2dPZSbNyzCt11TNn88HyTi3hF7N')
ORDER BY jobs.job_id LIMIT 772 ) t WHERE running_event_counts - t.event_count <= $1 AND running_payload_size - t.payload_size <= $2
2024-04-18 07:57:11.543 UTC [18435] FATAL: connection to client lost
2024-04-18 07:57:11.543 UTC [18435] STATEMENT: SELECT * FROM (SELECT
jobs.job_id, jobs.uuid, jobs.user_id, jobs.parameters, jobs.custom_val, jobs.event_payload, jobs.event_count,
jobs.created_at, jobs.expire_at, jobs.workspace_id,
pg_column_size(jobs.event_payload) as payload_size,
sum(jobs.event_count) over (order by jobs.job_id asc) as running_event_counts,
sum(pg_column_size(jobs.event_payload)) over (order by jobs.job_id) as running_payload_size,
job_latest_state.job_state, job_latest_state.attempt,
job_latest_state.exec_time, job_latest_state.retry_time,
job_latest_state.error_code, job_latest_state.error_response, job_latest_state.parameters
FROM
"gw_jobs_545" AS jobs
LEFT JOIN "gw_job_status_545" job_latest_state ON jobs.job_id=job_latest_state.job_id
WHERE ((job_latest_state.job_id IS NULL)) AND ((jobs.custom_val='GW')) AND (jobs.parameters->>'source_id'='2dPZSbNyzCt11TNn88HyTi3hF7N')
ORDER BY jobs.job_id LIMIT 772 ) t WHERE running_event_counts - t.event_count <= $1 AND running_payload_size - t.payload_size <= $2
2024-04-18 07:57:11.932 UTC [18470] LOG: unexpected EOF on client connection with an open transaction
2024-04-18 07:57:12.031 UTC [18499] LOG: incomplete startup packet
2024-04-18 07:57:17.140 UTC [18514] ERROR: relation "rsources_failed_keys_v2" already exists
2024-04-18 07:57:17.140 UTC [18514] STATEMENT: create table "rsources_failed_keys_v2" (
id VARCHAR(27) COLLATE "C",
db_name text not null default 'namaste-rudderstack-rudderstack-postgresql-0.namaste-rudderstack-rudderstack-postgresql-headless',
job_run_id text not null,
task_run_id text not null,
source_id text not null,
destination_id text not null,
primary key (id),
unique (job_run_id, task_run_id, source_id, destination_id, db_name)
)
2024-04-18 07:57:17.143 UTC [18514] ERROR: relation "rsources_failed_keys_v2_records" already exists
2024-04-18 07:57:17.143 UTC [18514] STATEMENT: create table "rsources_failed_keys_v2_records" (
id VARCHAR(27) COLLATE "C",
record_id text not null,
ts timestamp not null default NOW(),
primary key (id, record_id)
)
2024-04-18 07:57:17.338 UTC [18514] ERROR: relation "rsources_stats" already exists
2024-04-18 07:57:17.338 UTC [18514] STATEMENT: create table "rsources_stats" (
db_name text not null default 'namaste-rudderstack-rudderstack-postgresql-0.namaste-rudderstack-rudderstack-postgresql-headless',
job_run_id text not null,
task_run_id text not null,
source_id text not null,
destination_id text not null,
in_count integer not null default 0,
out_count integer not null default 0,
failed_count integer not null default 0,
ts timestamp not null default NOW(),
primary key (db_name, job_run_id, task_run_id, source_id, destination_id)
)
2024-04-18 07:57:20.533 UTC [18425] ERROR: canceling autovacuum task
2024-04-18 07:57:20.533 UTC [18425] CONTEXT: automatic vacuum of table "jobsdb.public.gw_job_status_511_1"
2024-04-18 07:57:21.128 UTC [18148] LOG: unexpected EOF on client connection with an open transaction
While there are no errors on transformer
➜ rudderstack-helm git:(master) ✗ kubectl logs --tail=10000 -f namaste-rudderstack-transformer-f45745c8f-8psss
> rudder-transformer@1.62.1 start
> cd dist;node ./src/index.js;cd ..
setting up statsd client
Stack Depth set: 64
Interval Bytes set: 524288
Transformer: Swagger route loading
Transformer: Swagger route loaded
{"extraData":{},"level":"info","message":"Using new routes","timestamp":"2024-04-18 06:48:30"}
Master (pid: 19) has started
Worker (pid: 26) is online
setting up statsd client
Stack Depth set: 64
Interval Bytes set: 524288
Transformer: Swagger route loading
Transformer: Swagger route loaded
{"extraData":{},"level":"info","message":"Using new routes","timestamp":"2024-04-18 06:48:33"}
Worker (pid: 26) has started
App started. Listening on port: 9090