Skip to content

feat: increase shm size for postgres in docker compose#5541

Merged
ktgowtham merged 1 commit into
rudderlabs:masterfrom
srgykuz:increase-db-shm-size
Feb 25, 2025
Merged

feat: increase shm size for postgres in docker compose#5541
ktgowtham merged 1 commit into
rudderlabs:masterfrom
srgykuz:increase-db-shm-size

Conversation

@srgykuz

@srgykuz srgykuz commented Feb 24, 2025

Copy link
Copy Markdown
Contributor

Description

Self-hosted Docker version of RudderStack sometimes may crash with the following stack trace:

rudderstack-backend-1  | 2025-02-03T10:17:00.400Z	ERROR	processor	processor/processor.go:3518	Failed to get unprocessed jobs from DB. Error: pq: could not resize shared memory segment "/PostgreSQL.1265551680" to 8388608 bytes: No space left on device
rudderstack-backend-1  | 2025-02-03T10:17:00.400Z	ERROR	runner.panic	crash/logger.go:33	Panic detected. Application will crash.	{"stack": "goroutine 1119 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:26 +0x5e\ngithub.com/rudderlabs/rudder-server/utils/crash.(*panicLogger).Notify.func1.1()\n\t/rudder-server/utils/crash/logger.go:34 +0x58\nsync.(*Once).doSlow(0x43bb25?, 0xc001409880?)\n\t/usr/local/go/src/sync/once.go:76 +0xb4\nsync.(*Once).Do(...)\n\t/usr/local/go/src/sync/once.go:67\ngithub.com/rudderlabs/rudder-server/utils/crash.(*panicLogger).Notify.func1()\n\t/rudder-server/utils/crash/logger.go:32 +0x9c\npanic({0x41c7000?, 0xc00237e6c0?})\n\t/usr/local/go/src/runtime/panic.go:785 +0x132\ngithub.com/rudderlabs/rudder-server/processor.(*Handle).getJobs(0xc000c90008, {0xc00522a6c0, 0x1b})\n\t/rudder-server/processor/processor.go:3519 +0x9ca\ngithub.com/rudderlabs/rudder-server/processor.(*worker).Work(0xc003655300)\n\t/rudder-server/processor/worker.go:141 +0xc9\ngithub.com/rudderlabs/rudder-server/utils/workerpool.(*internalWorker).start.func1()\n\t/rudder-server/utils/workerpool/internal_worker.go:64 +0x2a7\ngithub.com/rudderlabs/rudder-server/rruntime.Go.func1()\n\t/rudder-server/rruntime/goroutine-factory.go:23 +0x57\ncreated by github.com/rudderlabs/rudder-server/rruntime.Go in goroutine 929\n\t/rudder-server/rruntime/goroutine-factory.go:21 +0x4f\n", "panic": "pq: could not resize shared memory segment \"/PostgreSQL.1265551680\" to 8388608 bytes: No space left on device", "team": "Core", "goRoutines": 834, "version": "1.41.0", "releaseStage": "development", "appType": "rudder-server-EMBEDDED"}
rudderstack-backend-1  | 2025-02-03T10:17:00.400Z	ERROR	runner.panic	crash/logger.go:33	goroutine 1119 [running]:
rudderstack-backend-1  | github.com/rudderlabs/rudder-go-kit/logger.(*logger).Fataln(0xc000aa8ea0, {0x48ddf45, 0x27}, {0xc0000e2008, 0x7, 0xc00182e300?})
rudderstack-backend-1  | 	/go/pkg/mod/github.com/rudderlabs/rudder-go-kit@v0.46.1/logger/logger.go:381 +0x314
rudderstack-backend-1  | github.com/rudderlabs/rudder-server/utils/crash.(*panicLogger).Notify.func1.1()
rudderstack-backend-1  | 	/rudder-server/utils/crash/logger.go:33 +0x604
rudderstack-backend-1  | sync.(*Once).doSlow(0x43bb25?, 0xc001409880?)
rudderstack-backend-1  | 	/usr/local/go/src/sync/once.go:76 +0xb4
rudderstack-backend-1  | sync.(*Once).Do(...)
rudderstack-backend-1  | 	/usr/local/go/src/sync/once.go:67
rudderstack-backend-1  | github.com/rudderlabs/rudder-server/utils/crash.(*panicLogger).Notify.func1()
rudderstack-backend-1  | 	/rudder-server/utils/crash/logger.go:32 +0x9c
rudderstack-backend-1  | panic({0x41c7000?, 0xc00237e6c0?})
rudderstack-backend-1  | 	/usr/local/go/src/runtime/panic.go:785 +0x132
rudderstack-backend-1  | github.com/rudderlabs/rudder-server/processor.(*Handle).getJobs(0xc000c90008, {0xc00522a6c0, 0x1b})
rudderstack-backend-1  | 	/rudder-server/processor/processor.go:3519 +0x9ca
rudderstack-backend-1  | github.com/rudderlabs/rudder-server/processor.(*worker).Work(0xc003655300)
rudderstack-backend-1  | 	/rudder-server/processor/worker.go:141 +0xc9
rudderstack-backend-1  | github.com/rudderlabs/rudder-server/utils/workerpool.(*internalWorker).start.func1()
rudderstack-backend-1  | 	/rudder-server/utils/workerpool/internal_worker.go:64 +0x2a7
rudderstack-backend-1  | github.com/rudderlabs/rudder-server/rruntime.Go.func1()
rudderstack-backend-1  | 	/rudder-server/rruntime/goroutine-factory.go:23 +0x57
rudderstack-backend-1  | created by github.com/rudderlabs/rudder-server/rruntime.Go in goroutine 929
rudderstack-backend-1  | 	/rudder-server/rruntime/goroutine-factory.go:21 +0x4f
rudderstack-backend-1  | 	{"stack": "goroutine 1119 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:26 +0x5e\ngithub.com/rudderlabs/rudder-server/utils/crash.(*panicLogger).Notify.func1.1()\n\t/rudder-server/utils/crash/logger.go:34 +0x58\nsync.(*Once).doSlow(0x43bb25?, 0xc001409880?)\n\t/usr/local/go/src/sync/once.go:76 +0xb4\nsync.(*Once).Do(...)\n\t/usr/local/go/src/sync/once.go:67\ngithub.com/rudderlabs/rudder-server/utils/crash.(*panicLogger).Notify.func1()\n\t/rudder-server/utils/crash/logger.go:32 +0x9c\npanic({0x41c7000?, 0xc00237e6c0?})\n\t/usr/local/go/src/runtime/panic.go:785 +0x132\ngithub.com/rudderlabs/rudder-server/processor.(*Handle).getJobs(0xc000c90008, {0xc00522a6c0, 0x1b})\n\t/rudder-server/processor/processor.go:3519 +0x9ca\ngithub.com/rudderlabs/rudder-server/processor.(*worker).Work(0xc003655300)\n\t/rudder-server/processor/worker.go:141 +0xc9\ngithub.com/rudderlabs/rudder-server/utils/workerpool.(*internalWorker).start.func1()\n\t/rudder-server/utils/workerpool/internal_worker.go:64 +0x2a7\ngithub.com/rudderlabs/rudder-server/rruntime.Go.func1()\n\t/rudder-server/rruntime/goroutine-factory.go:23 +0x57\ncreated by github.com/rudderlabs/rudder-server/rruntime.Go in goroutine 929\n\t/rudder-server/rruntime/goroutine-factory.go:21 +0x4f\n", "panic": "pq: could not resize shared memory segment \"/PostgreSQL.1265551680\" to 8388608 bytes: No space left on device", "team": "Core", "goRoutines": 834, "version": "1.41.0", "releaseStage": "development", "appType": "rudder-server-EMBEDDED"}
rudderstack-backend-1  | panic: pq: could not resize shared memory segment "/PostgreSQL.1265551680" to 8388608 bytes: No space left on device [recovered]
rudderstack-backend-1  | 	panic: pq: could not resize shared memory segment "/PostgreSQL.1265551680" to 8388608 bytes: No space left on device
rudderstack-backend-1  | 
rudderstack-backend-1  | goroutine 1119 [running]:
rudderstack-backend-1  | github.com/rudderlabs/rudder-server/utils/crash.(*panicLogger).Notify.func1()
rudderstack-backend-1  | 	/rudder-server/utils/crash/logger.go:43 +0x85
rudderstack-backend-1  | panic({0x41c7000?, 0xc00237e6c0?})
rudderstack-backend-1  | 	/usr/local/go/src/runtime/panic.go:785 +0x132
rudderstack-backend-1  | github.com/rudderlabs/rudder-server/processor.(*Handle).getJobs(0xc000c90008, {0xc00522a6c0, 0x1b})
rudderstack-backend-1  | 	/rudder-server/processor/processor.go:3519 +0x9ca
rudderstack-backend-1  | github.com/rudderlabs/rudder-server/processor.(*worker).Work(0xc003655300)
rudderstack-backend-1  | 	/rudder-server/processor/worker.go:141 +0xc9
rudderstack-backend-1  | github.com/rudderlabs/rudder-server/utils/workerpool.(*internalWorker).start.func1()
rudderstack-backend-1  | 	/rudder-server/utils/workerpool/internal_worker.go:64 +0x2a7
rudderstack-backend-1  | github.com/rudderlabs/rudder-server/rruntime.Go.func1()
rudderstack-backend-1  | 	/rudder-server/rruntime/goroutine-factory.go:23 +0x57
rudderstack-backend-1  | created by github.com/rudderlabs/rudder-server/rruntime.Go in goroutine 929
rudderstack-backend-1  | 	/rudder-server/rruntime/goroutine-factory.go:21 +0x4f
rudderstack-backend-1 exited with code 0

The problem is in pq: could not resize shared memory segment. No space left on device. It occurs when there is no space left in /dev/shm which is used for shared memory.

By default Docker sets it to 64 MB using --shm-size, which is quite low for modern systems or high load environments. For example official Postgres Docker image sets shm_size to 128 MB.

This PR increases /dev/shm size of Postgres container for production Docker Compose config (rudder-docker.yml) from 64 MB to 128 MB. Setting it explicitly also highlights an ability to increase it, which may be the case for some users who expect high load. All uses who will set up their self-hosted version using this documentation will use explicitly set shm_size.

This PR also increases default shm_size of docker-compose.yml to match the production config. Though I didn't touch test configurations: warehouse/integrations/postgres/testdata/docker-compose.postgres.yml, warehouse/integrations/postgres/testdata/docker-compose.replication.yml, warehouse/integrations/testdata/docker-compose.jobsdb.yml. Tell me if they also should be tweaked.

Linear Ticket

There is no Linear Ticket, but here is related GitHub discussion - #5468

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

@contributor-support

Copy link
Copy Markdown

Thank you @Amaimersion for contributing this PR.
Please sign the Contributor License Agreement (CLA) before merging.

@atzoum atzoum changed the title feat: increase default Docker's /dev/shm size for Postgres in production Docker Compose config feat: increase shm size for postgres in docker compose Feb 25, 2025
@ktgowtham ktgowtham merged commit 58e5a17 into rudderlabs:master Feb 25, 2025
@srgykuz srgykuz deleted the increase-db-shm-size branch February 25, 2025 12:20
This was referenced Mar 3, 2025
itsmihir pushed a commit that referenced this pull request Mar 3, 2025
🤖 I have created a release *beep* *boop*
---


##
[1.44.0-rc.1](v1.43.0...v1.44.0-rc.1)
(2025-03-03)


### Features

* add OAuth authentication support for Databricks destination
([#5554](#5554))
([67775ab](67775ab))
* gRPC API to expire the warehouse schema of a destination
([#5508](#5508))
([f365c7c](f365c7c))
* increase shm size for postgres in docker compose
([#5541](#5541))
([58e5a17](58e5a17))
* isolate server ut communication
([#5430](#5430))
([63505f6](63505f6))
* make json library configurable and introduce sonnet as an option
([#5513](#5513))
([f3e5d1a](f3e5d1a))


### Bug Fixes

* error handling for async destinations
([#5542](#5542))
([5e63145](5e63145))
* handle schema change for unsafe quotes
([#5519](#5519))
([4527b82](4527b82))
* refreshing of datalake expired schemas causing inifinite loop
([#5530](#5530))
([0f11362](0f11362))
* retry sending reporting metrics for 4xx status code
([#5537](#5537))
([2cbda67](2cbda67))
* snowpipe backoff missing for validation error
([#5504](#5504))
([e889a4f](e889a4f))
* unmarshaller json configuration not respected
([#5526](#5526))
([9f8d6ad](9f8d6ad))


### Miscellaneous

* add stats for warehouse process API
([#5543](#5543))
([80890cf](80890cf))
* additional logs for schema fetching and deltalake
([#5551](#5551))
([49b5238](49b5238))
* **deps:** bump github.com/docker/docker from 27.5.1+incompatible to
28.0.0+incompatible
([#5536](#5536))
([277ec25](277ec25))
* **deps:** bump github.com/go-jose/go-jose/v4 from 4.0.4 to 4.0.5 in
the go_modules group
([#5544](#5544))
([adfb33c](adfb33c))
* **deps:** bump the go-deps group across 1 directory with 2 updates
([#5531](#5531))
([98abb11](98abb11))
* **deps:** bump the go-deps group across 1 directory with 2 updates
([#5553](#5553))
([99167ca](99167ca))
* **deps:** bump the go-deps group with 2 updates
([#5540](#5540))
([63ae5ac](63ae5ac))
* drop unused columns
([#5546](#5546))
([9acd978](9acd978))
* mssql and azure synapse cleanup and enable integration tests
([#5556](#5556))
([a1b19ba](a1b19ba))
* skip previously failed tables
([#5533](#5533))
([08e2936](08e2936))
* update event delivery time buckets
([#5548](#5548))
([85b457a](85b457a))
* varchar handling for mssql and azure synapse
([#5557](#5557))
([4309aa9](4309aa9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
itsmihir pushed a commit that referenced this pull request Mar 4, 2025
🤖 I have created a release *beep* *boop*
---


##
[1.44.0](v1.43.0...v1.44.0)
(2025-03-03)


### Features

* add OAuth authentication support for Databricks destination
([#5554](#5554))
([67775ab](67775ab))
* gRPC API to expire the warehouse schema of a destination
([#5508](#5508))
([f365c7c](f365c7c))
* increase shm size for postgres in docker compose
([#5541](#5541))
([58e5a17](58e5a17))
* isolate server ut communication
([#5430](#5430))
([63505f6](63505f6))
* make json library configurable and introduce sonnet as an option
([#5513](#5513))
([f3e5d1a](f3e5d1a))


### Bug Fixes

* error handling for async destinations
([#5542](#5542))
([5e63145](5e63145))
* handle schema change for unsafe quotes
([#5519](#5519))
([4527b82](4527b82))
* refreshing of datalake expired schemas causing inifinite loop
([#5530](#5530))
([0f11362](0f11362))
* retry sending reporting metrics for 4xx status code
([#5537](#5537))
([2cbda67](2cbda67))
* snowpipe backoff missing for validation error
([#5504](#5504))
([e889a4f](e889a4f))
* unmarshaller json configuration not respected
([#5526](#5526))
([9f8d6ad](9f8d6ad))


### Miscellaneous

* add stats for warehouse process API
([#5543](#5543))
([80890cf](80890cf))
* additional logs for schema fetching and deltalake
([#5551](#5551))
([49b5238](49b5238))
* **deps:** bump github.com/docker/docker from 27.5.1+incompatible to
28.0.0+incompatible
([#5536](#5536))
([277ec25](277ec25))
* **deps:** bump github.com/go-jose/go-jose/v4 from 4.0.4 to 4.0.5 in
the go_modules group
([#5544](#5544))
([adfb33c](adfb33c))
* **deps:** bump the go-deps group across 1 directory with 2 updates
([#5531](#5531))
([98abb11](98abb11))
* **deps:** bump the go-deps group across 1 directory with 2 updates
([#5553](#5553))
([99167ca](99167ca))
* **deps:** bump the go-deps group with 2 updates
([#5540](#5540))
([63ae5ac](63ae5ac))
* drop unused columns
([#5546](#5546))
([9acd978](9acd978))
* mssql and azure synapse cleanup and enable integration tests
([#5556](#5556))
([a1b19ba](a1b19ba))
* skip previously failed tables
([#5533](#5533))
([08e2936](08e2936))
* update event delivery time buckets
([#5548](#5548))
([85b457a](85b457a))
* varchar handling for mssql and azure synapse
([#5557](#5557))
([4309aa9](4309aa9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants