[devices] optimize last seen time updates #178

capcom6 · 2025-09-10T08:18:49Z

Summary by CodeRabbit

New Features
- Pluggable cache backend (memory + Redis), cache factory, online-status service with lifecycle, metrics and persistence.
Configuration
- Top-level cache config added (default memory://; Redis supported); example config updated.
Behavior
- Per-item TTL/options, safe concurrent cache operations, Cleanup/Drain, hashed token invalidation, batched last-seen persistence, online status persistence loop.
Build & CI
- Go toolchain and builder images bumped to 1.24; new Go workflow added; test job removed from publish pipeline; old lint workflow deleted.
Dependencies
- Multiple dependency upgrades and new Redis/core dependencies.
Tests
- Extensive unit, concurrency, benchmark, and memory-profile tests for cache.
Monitoring
- Grafana dashboard and Prometheus alerts for online metrics.

coderabbitai · 2025-09-10T08:32:12Z

Walkthrough

Adds a pluggable cache subsystem (memory/Redis) with API, implementations, tests/benchmarks, DI wiring and metrics; introduces an online service that batches/persists device last-seen via the cache; updates auth/devices to use online/cache; bumps Go/Docker build images and CI workflows; adds DB index and observability assets.

Changes

Cohort / File(s)	Summary
Build images `build/package/Dockerfile`, `build/package/Dockerfile.dev`	Bumped build base image from `golang:1.23-alpine` → `golang:1.24-alpine`.
Go module `go.mod`	Upgraded Go toolchain to `go 1.24.1`; added/updated direct and indirect dependencies (core, redis/go-redis/v9, fx, zap, validator, and multiple indirect adjustments).
Config & DI `internal/config/config.go`, `internal/config/module.go`	Added top-level `Cache` config (URL) with default `memory://` and fx provider exposing `cache.Config`.
App wiring `internal/sms-gateway/app.go`	Registered `cache.Module()` and `online.Module()` in the sms-gateway fx module.
Gateway cache module `internal/sms-gateway/cache/*` `.../config.go`, `.../factory.go`, `.../module.go`	New cache factory and fx module: `Config` type, URL parsing, Factory creating memory or Redis backend (prefixing keys), and fx wiring.
Cache API & backends `pkg/cache/*` `cache.go`, `errors.go`, `options.go`, `memory.go`, `redis.go`	Added public `Cache` interface, error vars, Option helpers (WithTTL/WithValidUntil), concurrency-safe in-memory implementation, and Redis-backed implementation.
Cache tests & benches `pkg/cache/*_test.go`, `pkg/cache/memory_bench_test.go`	Added extensive unit tests (edge, concurrency, profiling) and many benchmarks for memory cache.
Online service & metrics `internal/sms-gateway/online/*` `.../module.go`, `.../service.go`, `.../metrics.go`	New `online` fx module and `Service`: per-module cache instance, periodic drain-and-persist to devices service, lifecycle hook, and Prometheus metrics.
Auth & devices changes `internal/sms-gateway/modules/auth/service.go`, `internal/sms-gateway/modules/devices/*`	Auth now injects `online.Service` and calls `SetOnline` asynchronously; devices service introduces batch `SetLastSeen(ctx,batch)` and repository adds context-aware `SetLastSeen` with parameterized queries; token cache key hashing added.
DB migration & model `internal/sms-gateway/models/migrations/..._add_last_seen_index.sql`, `internal/sms-gateway/models/models.go`	Added migration to create `idx_devices_last_seen` and added index tag to `Device.LastSeen`.
CI workflows `.github/workflows/docker-publish.yml`, `.github/workflows/go.yml`, (deleted) `.github/workflows/golangci-lint.yml`	Removed `test` job from docker-publish, added new `go.yml` workflow (lint/test/benchmark), deleted previous golangci-lint workflow, and updated e2e setup to cache dependencies.
Example config `configs/config.example.yml`	Added `cache.url: memory://` example under fcm.
Observability `deployments/grafana/dashboards/online.json`, `deployments/prometheus/alerts/online-alerts.yml`	Added Grafana dashboard and Prometheus alert rules for online/cache/persistence metrics.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant App as fx.App
  participant Cfg as Config
  participant CacheMod as cache.Module
  participant Factory as cache.Factory
  participant Mem as MemoryBackend
  participant Redis as RedisBackend
  participant OnlineMod as online.Module
  participant OnlineSvc as online.Service
  participant Devices as devices.Service

  App->>Cfg: provide Config (cache.url)
  App->>CacheMod: init cache.Module()
  CacheMod->>Factory: fx.Provide(NewFactory(cfg.Cache))
  Factory->>Factory: parse URL & choose backend
  alt scheme == memory
    Factory->>Mem: NewMemory(...)
  else scheme == redis
    Factory->>Redis: redis.NewClient(...)
  end

  App->>OnlineMod: init online.Module()
  OnlineMod->>Factory: provide cache instance (f.New("online"))
  OnlineMod->>OnlineSvc: fx.Provide(New)
  App->>OnlineSvc: OnStart -> Run(ctx) (goroutine, ticker)
  OnlineSvc->>Factory: Drain() periodically -> batch
  OnlineSvc->>Devices: SetLastSeen(ctx, batch)
  App->>OnlineSvc: OnStop -> cancel ctx -> Run exits

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

[actions] fix e2e test workflow #121 — Overlaps CI edits to .github/workflows/docker-publish.yml (e2e setup change to cache-dependency-path).
Add support for multiple devices for user #117 — Related changes to devices module APIs and last-seen handling.
[build] upgrade Go to 1.23 #125 — Related Go toolchain and Docker base-image bumps.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.15% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly and accurately describes the primary intent—optimizing device last-seen time updates—and matches the changes that introduce an online caching layer, batch SetLastSeen behavior, repository/migration updates, and auth wiring to debounce/persist timestamps; it is concise, focused, and scoped to the devices area.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/debounce-last-seen-time-updates

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 16

🧹 Nitpick comments (18)

build/package/Dockerfile (1)
2-2: Pin Go base image to a patch version (reproducibility).

To avoid unexpected toolchain changes, pin to 1.24.1 (matches go.mod) or a digest.
-FROM golang:1.24-alpine AS build
+FROM golang:1.24.1-alpine AS build
build/package/Dockerfile.dev (1)
1-1: Pin dev base image to a patch version.

Same rationale as the prod builder: keep the toolchain stable.
-FROM golang:1.24-alpine
+FROM golang:1.24.1-alpine
pkg/cache/errors.go (1)
5-9: Add GoDoc for exported errors.

Public identifiers should have comments for lint/docs.
 var (
-	ErrKeyNotFound = errors.New("key not found")
-	ErrKeyExpired  = errors.New("key expired")
-	ErrKeyExists   = errors.New("key already exists")
+	// ErrKeyNotFound indicates no value exists for the given key.
+	ErrKeyNotFound = errors.New("key not found")
+	// ErrKeyExpired indicates a value exists but has expired.
+	ErrKeyExpired = errors.New("key expired")
+	// ErrKeyExists indicates a conflicting set when the key already exists.
+	ErrKeyExists = errors.New("key already exists")
 )
pkg/cache/options.go (3)
5-5: Document the exported Option type.

Exports should have a brief GoDoc.
-type Option func(*options)
+// Option configures per-item cache behavior (e.g., expiry).
+type Option func(*options)
21-25: Define TTL<=0 semantics; normalize time to UTC.

Current code sets validUntil=Now+ttl. With ttl<=0 this becomes “expires now/past”, which may conflict with the package-level default (“<=0 means never expire” per PR context). Either document that 0 expires immediately or treat <=0 as no expiry. Also consider using UTC consistently.
-func WithTTL(ttl time.Duration) Option {
-	return func(o *options) {
-		o.validUntil = time.Now().Add(ttl)
-	}
-}
+func WithTTL(ttl time.Duration) Option {
+	return func(o *options) {
+		if ttl <= 0 {
+			// No per-item expiry; rely on cache default TTL policy.
+			o.validUntil = time.Time{}
+			return
+		}
+		o.validUntil = time.Now().UTC().Add(ttl)
+	}
+}
If you prefer “0 means expire immediately”, please add that to the comment of WithTTL.

29-33: Normalize explicit timestamps to UTC (optional).

Reduces ambiguity if other components compare against UTC times.
 func WithValidUntil(validUntil time.Time) Option {
 	return func(o *options) {
-		o.validUntil = validUntil
+		o.validUntil = validUntil.UTC()
 	}
 }
internal/sms-gateway/cache/config.go (1)
3-5: Add a short GoDoc for Config.

Improves discoverability in IDEs/docs.
-type Config struct {
+// Config controls the cache backend via a URL (e.g., "memory://", "redis://...").
+type Config struct {
 	URL string
 }
internal/config/config.go (1)

17-17: Leverage existing cache URL validation or add early config validation
The cache factory already parses and checks the URL scheme (internal/sms-gateway/cache/factory.go:24–29). To fail fast on misconfiguration, optionally introduce a Validate() method on the Config struct in internal/config/config.go that invokes url.Parse on Cache.URL.
pkg/cache/memory.go (2)
44-48: Cleanup method has an empty callback - consider simplifying.

The empty callback pattern seems unnecessary here.
 // Cleanup implements Cache.
 func (m *memoryCache) Cleanup(_ context.Context) error {
-  m.cleanup(func() {})
+  now := time.Now()
+  m.mux.Lock()
+  for key, item := range m.items {
+    if item.isExpired(now) {
+      delete(m.items, key)
+    }
+  }
+  m.mux.Unlock()
   return nil
 }
134-146: Consider implementing lazy expiration for better performance.

The current implementation checks expiration on every read. For better performance at scale, consider:

Lazy deletion during reads (current approach) combined with

Periodic background cleanup to prevent memory bloat

This is particularly important for high-traffic scenarios where expired entries might accumulate if they're not frequently accessed.
pkg/cache/memory_profile_test.go (3)
62-108: Fix the error message threshold mismatch.

The error message states "1MB" but the actual threshold is 2MB.

Apply this diff to fix the error message:
-if allocDiff > 2*1024*1024 { // 2MB
-  t.Errorf("Expected less than 1MB memory growth after cleanup, got %d bytes", allocDiff)
+if allocDiff > 2*1024*1024 { // 2MB
+  t.Errorf("Expected less than 2MB memory growth after cleanup, got %d bytes", allocDiff)
167-214: Fix variable name typo in comment.

There's a typo in the comment on line 168.

Apply this diff:
-// This test verifies c behavior under frequent GC cycles
+// This test verifies cache behavior under frequent GC cycles
269-297: Add error handling in benchmark operations.

The benchmark should handle potential errors from cache operations.

Apply this diff:
 // Set and get
-cache.Set(ctx, key, value)
-cache.Get(ctx, key)
+if err := cache.Set(ctx, key, value); err != nil {
+  b.Fatal(err)
+}
+if _, err := cache.Get(ctx, key); err != nil {
+  b.Fatal(err)
+}

 // Delete
-cache.Delete(ctx, key)
+if err := cache.Delete(ctx, key); err != nil {
+  b.Fatal(err)
+}
pkg/cache/memory_concurrency_test.go (2)
98-142: Consider using atomic counters for operation tracking.

While the test correctly handles concurrent read/write operations, tracking successful operations with atomic counters would provide better insights into the test's behavior and help detect potential issues like excessive contention.

Apply this enhancement to add operation tracking:
 func TestMemoryCache_ConcurrentReadWrite(t *testing.T) {
 	c := cache.NewMemory(0)
 
 	ctx := context.Background()
 	const numOperations = 1000
 	const numReaders = 8
 	const numWriters = 2
 	var wg sync.WaitGroup
+	var readCount, writeCount atomic.Int64
 
 	// Launch concurrent readers
 	for i := 0; i < numReaders; i++ {
 		wg.Add(1)
 		go func() {
 			defer wg.Done()
 
 			for j := 0; j < numOperations/numReaders; j++ {
 				key := "shared-key"
 				_, err := c.Get(ctx, key)
 				if err != nil && err != cache.ErrKeyNotFound {
 					t.Errorf("Get failed: %v", err)
+				} else if err == nil {
+					readCount.Add(1)
 				}
 			}
 		}()
 	}
 
 	// Launch concurrent writers
 	for i := 0; i < numWriters; i++ {
 		wg.Add(1)
 		go func() {
 			defer wg.Done()
 
 			for j := 0; j < numOperations/numWriters; j++ {
 				key := "shared-key"
 				value := "value-" + strconv.Itoa(j)
 
 				err := c.Set(ctx, key, value)
 				if err != nil {
 					t.Errorf("Set failed: %v", err)
+				} else {
+					writeCount.Add(1)
 				}
 			}
 		}()
 	}
 
 	wg.Wait()
+	t.Logf("Completed %d successful reads and %d writes", readCount.Load(), writeCount.Load())
 }
239-281: Consider validating drain operation results.

While the test correctly verifies that the cache is empty after draining, it doesn't validate that the drain operations actually returned the expected items. Only one drain operation should return items, while the others should return empty results.

Apply this enhancement to validate drain results:
 func TestMemoryCache_ConcurrentDrain(t *testing.T) {
 	c := cache.NewMemory(0)
 
 	ctx := context.Background()
 	const numItems = 100
 	const numGoroutines = 5
 	var wg sync.WaitGroup
+	var drainResults sync.Map
 
 	// Pre-populate cache with items
 	for i := 0; i < numItems; i++ {
 		key := "item-" + strconv.Itoa(i)
 		value := "value-" + strconv.Itoa(i)
 
 		err := c.Set(ctx, key, value)
 		if err != nil {
 			t.Fatalf("Set failed for item %d: %v", i, err)
 		}
 	}
 
 	// Launch concurrent drain operations
 	for i := 0; i < numGoroutines; i++ {
 		wg.Add(1)
-		go func() {
+		go func(id int) {
 			defer wg.Done()
 
-			_, err := c.Drain(ctx)
+			items, err := c.Drain(ctx)
 			if err != nil {
 				t.Errorf("Drain failed: %v", err)
 			}
-		}()
+			drainResults.Store(id, items)
+		}(i)
 	}
 
 	wg.Wait()
+
+	// Verify that items were drained (at least one goroutine should have gotten items)
+	totalDrained := 0
+	drainResults.Range(func(key, value interface{}) bool {
+		items := value.(map[string]interface{})
+		totalDrained += len(items)
+		return true
+	})
+	
+	if totalDrained != numItems {
+		t.Errorf("Expected %d total items drained, got %d", numItems, totalDrained)
+	}
 
 	// Cache should be empty after all drain operations
 	for i := 0; i < numItems; i++ {
 		key := "item-" + strconv.Itoa(i)
 		_, err := c.Get(ctx, key)
 		if err != cache.ErrKeyNotFound {
 			t.Errorf("Expected ErrKeyNotFound for key %s after drain, got %v", key, err)
 		}
 	}
 }
pkg/cache/memory_edge_test.go (3)
220-257: Improve performance measurement accuracy.

The test includes a performance measurement but the threshold check at Line 254 seems arbitrary. The variable duration is set to 100ms but never used in the actual operations.

Either remove the unused duration variable or clarify the performance expectations:
 func TestMemoryCache_RapidOperations(t *testing.T) {
 	// Test rapid c operations
 	c := cache.NewMemory(0)
 	ctx := context.Background()
 
 	const numOperations = 1000
-	const duration = 100 * time.Millisecond
 
 	start := time.Now()
 	opsCompleted := 0
 
 	for i := 0; i < numOperations; i++ {
 		// Alternate between set and get
 		if i%2 == 0 {
 			key := "rapid-key-" + strconv.Itoa(i)
 			value := "rapid-value-" + strconv.Itoa(i)
 			err := c.Set(ctx, key, value)
 			if err != nil {
 				t.Errorf("Set failed: %v", err)
 			}
 		} else {
 			key := "rapid-key-" + strconv.Itoa(i-1)
 			_, err := c.Get(ctx, key)
 			if err != nil && err != cache.ErrKeyNotFound {
 				t.Errorf("Get failed: %v", err)
 			}
 		}
 		opsCompleted++
 	}
 
 	durationTaken := time.Since(start)
 	t.Logf("Completed %d operations in %v (%.2f ops/ms)", opsCompleted, durationTaken, float64(opsCompleted)/float64(durationTaken.Milliseconds()))
-
-	// Verify operations completed within reasonable time
-	if durationTaken > 2*duration {
-		t.Errorf("Operations took too long: %v", durationTaken)
-	}
 }
329-351: Potential issue with zero-byte key.

The test creates a 1KB key using make([]byte, 1024) which results in a string of 1024 null bytes. While this tests the cache's ability to handle long keys, it doesn't test with actual character data. Consider using a more realistic long key.

Generate a more realistic long key:
 func TestMemoryCache_ExtremeKeyLength(t *testing.T) {
 	// Test with very long keys
 	cache := cache.NewMemory(0)
 	ctx := context.Background()
 
 	// Create a very long key (1KB)
-	longKey := string(make([]byte, 1024))
+	longKey := strings.Repeat("a", 1024)
 	value := "extreme-key-value"
 
 	err := cache.Set(ctx, longKey, value)
 	if err != nil {
 		t.Fatalf("Set with long key failed: %v", err)
 	}
 
 	retrieved, err := cache.Get(ctx, longKey)
 	if err != nil {
 		t.Fatalf("Get with long key failed: %v", err)
 	}
 
 	if retrieved != value {
 		t.Errorf("Expected %s, got %s", value, retrieved)
 	}
 }
Note: You'll need to add "strings" to the imports at the top of the file.

353-389: Potential race in expiration timing test.

The test sets items to expire in 10ms and then has goroutines sleep for exactly 10ms before attempting access. This creates a race where some goroutines might access the item just before expiration while others access it just after. While this tests race conditions, the timing is very tight and might lead to flaky tests.

Consider adding a small random jitter to make the test more robust:
 func TestMemoryCache_RaceConditionWithExpiration(t *testing.T) {
 	// Test race conditions between expiration and access
 	c := cache.NewMemory(0)
 	ctx := context.Background()
 
 	key := "race-expire-key"
 	value := "race-expire-value"
 	ttl := 10 * time.Millisecond
 
 	// Set item with short TTL
 	err := c.Set(ctx, key, value, cache.WithTTL(ttl))
 	if err != nil {
 		t.Fatalf("Set failed: %v", err)
 	}
 
 	const numGoroutines = 50
 	var wg sync.WaitGroup
 
 	// Launch goroutines that try to access the key while it's expiring
 	for i := 0; i < numGoroutines; i++ {
 		wg.Add(1)
-		go func() {
+		go func(id int) {
 			defer wg.Done()
 
-			// Wait for the item to be close to expiration
-			time.Sleep(ttl)
+			// Wait for the item to be close to expiration with some jitter
+			jitter := time.Duration(id%3) * time.Millisecond
+			time.Sleep(ttl - 2*time.Millisecond + jitter)
 
 			// Try to get the item
 			_, err := c.Get(ctx, key)
 			if err != nil && err != cache.ErrKeyExpired && err != cache.ErrKeyNotFound {
 				t.Errorf("Get failed: %v", err)
 			}
-		}()
+		}(i)
 	}
 
 	wg.Wait()
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ea8734 and f947f5a.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (19)

build/package/Dockerfile (1 hunks)
build/package/Dockerfile.dev (1 hunks)
go.mod (4 hunks)
internal/config/config.go (3 hunks)
internal/config/module.go (2 hunks)
internal/sms-gateway/app.go (2 hunks)
internal/sms-gateway/cache/config.go (1 hunks)
internal/sms-gateway/cache/factory.go (1 hunks)
internal/sms-gateway/cache/module.go (1 hunks)
pkg/cache/cache.go (1 hunks)
pkg/cache/errors.go (1 hunks)
pkg/cache/memory.go (1 hunks)
pkg/cache/memory_bench_test.go (1 hunks)
pkg/cache/memory_concurrency_test.go (1 hunks)
pkg/cache/memory_edge_test.go (1 hunks)
pkg/cache/memory_profile_test.go (1 hunks)
pkg/cache/memory_test.go (1 hunks)
pkg/cache/options.go (1 hunks)
pkg/cache/redis.go (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (12)

internal/sms-gateway/cache/module.go (3)

internal/config/module.go (1)

Module (21-124)

internal/sms-gateway/app.go (1)

Module (34-56)

internal/sms-gateway/cache/factory.go (1)

NewFactory (19-49)

pkg/cache/cache.go (1)

pkg/cache/options.go (1)

Option (5-5)

pkg/cache/redis.go (3)

pkg/cache/cache.go (1)

Cache (5-36)

pkg/cache/errors.go (2)

ErrKeyNotFound (6-6)

ErrKeyExists (8-8)

pkg/cache/options.go (1)

Option (5-5)

pkg/cache/memory_bench_test.go (2)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/options.go (1)

WithTTL (21-25)

internal/sms-gateway/cache/factory.go (3)

pkg/cache/cache.go (1)

Cache (5-36)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/redis.go (1)

NewRedis (24-36)

pkg/cache/memory_test.go (3)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/options.go (2)

WithTTL (21-25)

WithValidUntil (29-33)

pkg/cache/errors.go (2)

ErrKeyNotFound (6-6)

ErrKeyExists (8-8)

internal/sms-gateway/app.go (2)

internal/sms-gateway/cache/module.go (1)

Module (8-16)

internal/config/module.go (1)

Module (21-124)

pkg/cache/memory_concurrency_test.go (3)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/errors.go (3)

ErrKeyNotFound (6-6)

ErrKeyExists (8-8)

ErrKeyExpired (7-7)

pkg/cache/options.go (1)

WithTTL (21-25)

pkg/cache/memory_profile_test.go (2)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/errors.go (1)

ErrKeyNotFound (6-6)

pkg/cache/memory.go (3)

pkg/cache/cache.go (1)

Cache (5-36)

pkg/cache/options.go (1)

Option (5-5)

pkg/cache/errors.go (3)

ErrKeyExists (8-8)

ErrKeyNotFound (6-6)

ErrKeyExpired (7-7)

pkg/cache/memory_edge_test.go (3)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/options.go (1)

WithTTL (21-25)

pkg/cache/errors.go (2)

ErrKeyExpired (7-7)

ErrKeyNotFound (6-6)

internal/config/module.go (2)

internal/config/config.go (2)

Config (10-18)

Cache (74-76)

internal/sms-gateway/cache/config.go (1)

Config (3-5)

🪛 GitHub Check: lint

pkg/cache/memory_bench_test.go

[failure] 154-154:
Error return value of cache.Drain is not checked (errcheck)

[failure] 134-134:
Error return value of cache.Cleanup is not checked (errcheck)

[failure] 114-114:
Error return value of cache.Delete is not checked (errcheck)

[failure] 95-95:
Error return value of cache.GetAndDelete is not checked (errcheck)

[failure] 76-76:
Error return value of cache.SetOrFail is not checked (errcheck)

[failure] 61-61:
Error return value of cache.Get is not checked (errcheck)

[failure] 60-60:
Error return value of cache.Set is not checked (errcheck)

[failure] 42-42:
Error return value of cache.Get is not checked (errcheck)

[failure] 37-37:
Error return value of cache.Set is not checked (errcheck)

[failure] 24-24:
Error return value of cache.Set is not checked (errcheck)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Test
GitHub Check: E2E
GitHub Check: Analyze (go)

🔇 Additional comments (49)

go.mod (1)

3-3: Ensure CI/build images match Go 1.24.1.

go.mod targets 1.24.1; please align Dockerfiles (suggested) and verify CI uses the same toolchain to avoid mismatched stdlib behavior.

internal/config/module.go (1)

7-7: LGTM! Clean integration of cache configuration into the DI graph.

The cache configuration provider follows the established pattern in the module and correctly wires the cache URL from the main config into the cache subsystem.

Also applies to: 119-123

internal/sms-gateway/cache/factory.go (1)

19-49: Verify Redis client initialization and document scheme

Ensure the Redis client performs a connectivity check (e.g. Ping) on creation or handle failures early.

Confirm its default connection‐pool settings satisfy production load and adjust/configure pooling if needed.

Document the redis:// URL scheme in the Config.URL behavior and expectations.

pkg/cache/cache.go (1)

5-36: Well-designed cache interface with clear semantics.

The interface is clean, follows Go conventions, and provides a good balance of functionality. The error semantics are clearly documented.

pkg/cache/memory_test.go (19)

11-33: LGTM! Well-structured basic set/get test.

The test properly verifies the basic Set/Get operations with clear setup and assertions.

35-58: LGTM! Good coverage of TTL functionality.

The test properly validates the WithTTL option for setting values with explicit TTL.

60-83: LGTM! ValidUntil option properly tested.

The test correctly validates the WithValidUntil option for setting explicit expiration times.

85-108: LGTM! Default TTL behavior validated.

The test verifies that items use the default TTL when no explicit TTL is provided.

110-120: LGTM! Error signaling correctly tested.

The test properly verifies that ErrKeyNotFound is returned for non-existent keys.

122-144: LGTM! SetOrFail for new keys works correctly.

The test validates that SetOrFail successfully sets new keys and returns the expected values.

146-175: LGTM! SetOrFail correctly prevents overwrites.

The test properly verifies that SetOrFail returns ErrKeyExists when trying to overwrite existing keys and preserves the original value.

177-201: LGTM! Delete operation properly tested.

The test validates successful deletion and confirms the key is removed.

203-214: LGTM! Graceful handling of non-existent key deletion.

The test correctly verifies that deleting a non-existent key doesn't return an error.

216-244: LGTM! GetAndDelete atomic operation validated.

The test properly verifies that GetAndDelete returns the value and removes the key in a single operation.

246-257: LGTM! GetAndDelete error handling verified.

The test correctly validates that GetAndDelete returns ErrKeyNotFound for non-existent keys.

259-306: LGTM! Comprehensive drain functionality test.

The test thoroughly validates that Drain returns all items and clears the cache.

308-322: LGTM! Empty cache drain properly handled.

The test correctly verifies that draining an empty cache returns an empty map without errors.

324-358: LGTM! Cleanup removes expired items correctly.

The test properly validates that Cleanup removes expired items while preserving valid ones.

360-388: LGTM! Cleanup preserves non-expired items.

The test correctly verifies that cleanup doesn't affect items without expiration.

390-419: LGTM! Overwrite functionality validated.

The test properly verifies that Set can overwrite existing values.

421-443: LGTM! Empty string values handled correctly.

The test validates that empty strings can be stored and retrieved properly.

445-467: LGTM! Special characters properly supported.

The test verifies that keys and values with special characters are handled correctly.

469-491: LGTM! Large values handled properly.

The test validates that the cache can store and retrieve large values (1MB) accurately.

pkg/cache/memory_bench_test.go (10)

159-189: LGTM! Concurrent read benchmark well-structured.

The benchmark properly tests read performance with varying levels of concurrency.

191-221: LGTM! Concurrent write benchmark properly implemented.

The benchmark effectively measures write performance under different concurrency levels with unique keys.

223-262: LGTM! Mixed workload benchmark provides valuable insights.

The benchmark effectively simulates realistic cache usage patterns with configurable read/write ratios.

264-301: LGTM! Scaling benchmark comprehensively tests load patterns.

The benchmark provides good coverage of various load scenarios from small to very large.

303-333: LGTM! TTL overhead benchmark is well-designed.

The benchmark effectively measures the performance impact of TTL operations.

335-367: LGTM! Large value benchmark covers important use cases.

The benchmark properly tests performance with various payload sizes from 1KB to 1MB.

369-393: LGTM! Memory growth benchmark provides useful allocation insights.

The benchmark effectively measures memory allocation patterns with different cache sizes.

395-417: LGTM! Random access benchmark simulates realistic access patterns.

The benchmark properly tests performance with random key access over a pre-populated cache.

419-435: LGTM! Hot key benchmark tests an important edge case.

The benchmark effectively measures performance when a single key is accessed frequently.

437-459: LGTM! Cold key benchmark complements the hot key test.

The benchmark properly tests performance with distributed access patterns across many keys.

pkg/cache/memory_profile_test.go (2)

12-60: LGTM! Memory allocation profiling test is well-implemented.

The test effectively analyzes memory allocation patterns with appropriate thresholds and detailed logging.

110-165: LGTM! Memory pressure test simulates realistic scenarios.

The test effectively validates cache behavior under memory pressure with multiple cache instances.

pkg/cache/memory_concurrency_test.go (6)

14-49: LGTM! Well-structured concurrent read test.

The test properly validates thread-safe concurrent reads with appropriate synchronization and error handling.

51-96: LGTM! Comprehensive concurrent write test with good verification.

The test effectively validates concurrent writes with non-overlapping keys and proper post-write verification.

144-190: LGTM! Comprehensive test of mixed cache operations.

The test effectively validates the complete lifecycle of cache entries with concurrent Set, Get, and Delete operations.

192-237: LGTM! Excellent test for SetOrFail atomicity.

The test properly validates that only one goroutine succeeds in setting a key when using SetOrFail, with accurate counting of success and failure cases.

336-382: LGTM! Well-designed GetAndDelete concurrency test.

The test effectively validates concurrent GetAndDelete operations with proper pre-population and post-verification.

384-422: LGTM! Excellent race condition detection test.

The test is well-designed to trigger race conditions with time-based operation selection and concurrent access patterns. This will be valuable when running with the race detector enabled.

pkg/cache/memory_edge_test.go (8)

13-37: LGTM! Proper test for zero TTL behavior.

The test correctly validates that items with zero TTL do not expire.

39-60: LGTM! Good edge case test for immediate expiration.

The test properly validates TTL expiration behavior with appropriate timing margins.

62-81: LGTM! Important nil context edge case test.

The test validates that the cache handles nil context gracefully, which is important for robustness.

83-103: LGTM! Good validation of empty key handling.

The test ensures the cache properly handles empty string keys, which is an important edge case.

105-145: LGTM! Comprehensive TTL overwrite test.

The test thoroughly validates that overwriting a key updates its TTL correctly.

147-218: LGTM! Excellent mixed TTL scenario test.

The test comprehensively validates different TTL behaviors with proper timing and verification of expiration states.

259-286: LGTM! Good edge case test for empty cache cleanup.

The test validates that cleanup operations work correctly on an empty cache and don't affect subsequent operations.

288-327: LGTM! Comprehensive drain test with expired items.

The test properly validates that drain operations exclude expired items and only return valid entries.

internal/sms-gateway/app.go

internal/sms-gateway/cache/module.go

pkg/cache/memory_bench_test.go

pkg/cache/memory_profile_test.go

pkg/cache/memory.go

pkg/cache/redis.go

coderabbitai

Actionable comments posted: 9

♻️ Duplicate comments (2)

pkg/cache/memory_profile_test.go (1)
243-245: Handle Drain error (matches prior feedback).

This call drops the error and can hide failures.
-		tempCache.Drain(ctx)
+		if _, err := tempCache.Drain(ctx); err != nil {
+			t.Errorf("Drain failed: %v", err)
+		}
internal/sms-gateway/app.go (1)
47-47: Drop cache.Module() logger decoration (global logger name pollution).

Including cache.Module() while it decorates *zap.Logger to "cache" renames subsequent loggers across the app. Please remove that decoration in internal/sms-gateway/cache/module.go. This was raised earlier and still applies.

Apply in cache/module.go:
 func Module() fx.Option {
   return fx.Module(
     "cache",
-    fx.Decorate(func(log *zap.Logger) *zap.Logger {
-      return log.Named("cache")
-    }),
     fx.Provide(NewFactory),
   )
 }

🧹 Nitpick comments (8)

pkg/cache/memory_profile_test.go (2)
275-297: Benchmark memory accounting is racy; measure outside RunParallel and use ReportAllocs.

Reading MemStats per goroutine races with other goroutines and divides by b.N incorrectly. Measure once around the whole run.
-	b.ResetTimer()
-	b.RunParallel(func(pb *testing.PB) {
-		var m1, m2 runtime.MemStats
-		runtime.ReadMemStats(&m1)
+	b.ReportAllocs()
+	runtime.GC()
+	var m1, m2 runtime.MemStats
+	runtime.ReadMemStats(&m1)
+	b.ResetTimer()
+	b.RunParallel(func(pb *testing.PB) {
 		i := 0
 		for pb.Next() {
 			key := "bench-key-" + strconv.Itoa(i)
 			value := "bench-value-" + strconv.Itoa(i)
 			// Set and get
 			cache.Set(ctx, key, value)
 			cache.Get(ctx, key)
 			// Delete
 			cache.Delete(ctx, key)
 			i++
 		}
-		runtime.ReadMemStats(&m2)
-		b.Logf("Memory usage per iteration: %.2f bytes", float64(m2.TotalAlloc-m1.TotalAlloc)/float64(b.N))
 	})
+	runtime.ReadMemStats(&m2)
+	b.Logf("TotalAlloc per op: %.2f bytes/op", float64(m2.TotalAlloc-m1.TotalAlloc)/float64(b.N))
56-60: Reduce flakiness of hard-coded memory thresholds.

Per-item and absolute heap thresholds vary by Go version, arch, GC settings. Consider widening margins, gating under testing.Short, or converting strict asserts to logs/benchmarks.

Also applies to: 106-108, 160-165, 265-267
pkg/cache/memory_concurrency_test.go (1)
284-288: Avoid panic on type assertion in results aggregation.

If Drain’s return type changes, this panics. Use a checked assertion.
-	drainResults.Range(func(key, value any) bool {
-		items := value.(map[string]string)
+	drainResults.Range(func(key, value any) bool {
+		items, ok := value.(map[string]string)
+		if !ok {
+			t.Errorf("unexpected Drain result type: %T", value)
+			return true
+		}
pkg/cache/memory_edge_test.go (2)
54-60: Be tolerant to ErrKeyNotFound if Get removes expired entries.

Depending on Get’s semantics, expired keys may be evicted and reported as not found.
-	_, err = c.Get(ctx, key)
-	if err != cache.ErrKeyExpired {
-		t.Errorf("Expected ErrKeyExpired, got %v", err)
-	}
+	_, err = c.Get(ctx, key)
+	if err != cache.ErrKeyExpired && err != cache.ErrKeyNotFound {
+		t.Errorf("Expected ErrKeyExpired or ErrKeyNotFound, got %v", err)
+	}
221-258: Make RapidOperations test less time-sensitive.

CI variance can cause false negatives. Gate under -short or relax the SLA.
 func TestMemoryCache_RapidOperations(t *testing.T) {
+	if testing.Short() {
+		t.Skip("skipping time-sensitive test in -short mode")
+	}
internal/sms-gateway/online/service.go (1)
31-40: Optional: add small jitter to the ticker to avoid sync across replicas.

Helps reduce thundering herd if many instances start together.

Example:
 func (s *Service) Run(ctx context.Context) {
-  ticker := time.NewTicker(1 * time.Minute)
+  interval := time.Minute
+  jitter := time.Duration(rand.Int63n(int64(interval / 10)))
+  timer := time.NewTimer(jitter)
+  defer timer.Stop()
+  select {
+  case <-ctx.Done():
+    return
+  case <-timer.C:
+  }
+  ticker := time.NewTicker(interval)
   defer ticker.Stop()
(Remember to import math/rand.)
pkg/cache/redis.go (2)

49-52: Cleanup no-op: acceptable if per-key TTL is used; document behavior.

If you keep the hash approach, implement selective expiry removal; otherwise, add a short doc noting Redis TTL handles expiry.

35-47: Nit: rename struct field key → prefix/hashKey for clarity.

Improves readability and reduces confusion with method param key.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f947f5a and cb1b73f.

📒 Files selected for processing (12)

internal/sms-gateway/app.go (4 hunks)
internal/sms-gateway/cache/config.go (1 hunks)
internal/sms-gateway/cache/factory.go (1 hunks)
internal/sms-gateway/online/module.go (1 hunks)
internal/sms-gateway/online/service.go (1 hunks)
pkg/cache/errors.go (1 hunks)
pkg/cache/memory_bench_test.go (1 hunks)
pkg/cache/memory_concurrency_test.go (1 hunks)
pkg/cache/memory_edge_test.go (1 hunks)
pkg/cache/memory_profile_test.go (1 hunks)
pkg/cache/options.go (1 hunks)
pkg/cache/redis.go (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

pkg/cache/errors.go
pkg/cache/options.go
internal/sms-gateway/cache/factory.go
internal/sms-gateway/cache/config.go
pkg/cache/memory_bench_test.go

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-10T11:11:43.452Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/memory.go:59-74
Timestamp: 2025-09-10T11:11:43.452Z
Learning: In pkg/cache/memory.go, the cleanup method removes expired items first, then executes the provided callback function. This means methods like Drain that use cleanup don't need additional expiration filtering since expired items are already removed before the callback runs.

Applied to files:

pkg/cache/memory_concurrency_test.go
pkg/cache/memory_edge_test.go
pkg/cache/redis.go
pkg/cache/memory_profile_test.go

🧬 Code graph analysis (7)

internal/sms-gateway/online/service.go (2)

internal/sms-gateway/cache/factory.go (1)

Cache (11-11)

internal/sms-gateway/modules/events/service.go (2)

NewService (26-38)

deviceSvc (13-24)

internal/sms-gateway/online/module.go (4)

internal/sms-gateway/app.go (1)

Module (35-58)

internal/sms-gateway/cache/factory.go (2)

Factory (13-15)

Cache (11-11)

internal/sms-gateway/online/service.go (2)

New (20-28)

Service (12-18)

internal/sms-gateway/openapi/module.go (1)

Module (8-16)

pkg/cache/memory_concurrency_test.go (3)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/errors.go (2)

ErrKeyNotFound (7-7)

ErrKeyExists (11-11)

pkg/cache/options.go (1)

WithTTL (22-30)

internal/sms-gateway/app.go (3)

internal/sms-gateway/online/module.go (1)

Module (11-35)

internal/sms-gateway/cache/module.go (1)

Module (8-16)

internal/config/module.go (1)

Module (21-124)

pkg/cache/memory_edge_test.go (3)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/options.go (1)

WithTTL (22-30)

pkg/cache/errors.go (2)

ErrKeyExpired (9-9)

ErrKeyNotFound (7-7)

pkg/cache/redis.go (3)

pkg/cache/cache.go (1)

Cache (5-36)

pkg/cache/errors.go (2)

ErrKeyNotFound (7-7)

ErrKeyExists (11-11)

pkg/cache/options.go (1)

Option (6-6)

pkg/cache/memory_profile_test.go (2)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/errors.go (1)

ErrKeyNotFound (7-7)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: E2E
GitHub Check: lint
GitHub Check: Analyze (go)

🔇 Additional comments (7)

pkg/cache/memory_concurrency_test.go (2)

323-349: Timing fix for concurrent cleanup looks good.

Waiting for TTL to elapse before cleanup avoids the prior race. LGTM.

109-116: No action needed: go.mod specifies Go 1.24.1, which supports integer ranges (Go 1.22+).

pkg/cache/memory_edge_test.go (1)

289-328: Drain behavior assertion matches implementation semantics.

Verifying that Drain returns only non-expired items and purges expired ones aligns with the cache’s cleanup-first contract.

internal/sms-gateway/online/module.go (1)

21-33: Lifecycle wiring LGTM.

Start/stop with cancelable context is correct.

internal/sms-gateway/online/service.go (1)

7-9: Cache type alias exists; no action required
internal/sms-gateway/cache/factory.go already declares type Cache = cache.Cache, so the return type from online.Module and the cache.Cache import in service.go are identical.

pkg/cache/redis.go (2)

92-104: Atomic GetAndDelete: good.

Atomicity addressed (Lua previously; with per-key refactor, prefer GETDEL).

118-121: HExpireAt supported in go-redis v9.6+: no changes needed.
The HExpireAt method was introduced in go-redis/v9.6.x alongside Redis ≥7.4’s hash‐field expiration commands, so the current code is valid.

Likely an incorrect or invalid review comment.

internal/sms-gateway/app.go

internal/sms-gateway/online/module.go

internal/sms-gateway/online/service.go

pkg/cache/memory_profile_test.go

pkg/cache/redis.go

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal/sms-gateway/modules/devices/service.go (1)
123-125: Token-cache invalidation bug: key mismatch

GetByToken caches by SHA-256(hex) of the token, but Delete uses the raw token, so entries are never invalidated.

Apply:
-	if err := s.tokensCache.Delete(device.AuthToken); err != nil {
+	hash := sha256.Sum256([]byte(device.AuthToken))
+	cacheKey := hex.EncodeToString(hash[:])
+	if err := s.tokensCache.Delete(cacheKey); err != nil {
 		s.logger.Error("can't invalidate token cache", zap.Error(err))
 	}

♻️ Duplicate comments (5)

pkg/cache/memory_profile_test.go (4)

45-55: Use HeapAlloc delta, not TotalAlloc, for per-item growth.

TotalAlloc is cumulative; HeapAlloc reflects heap-in-use.

- allocDiff := m2.TotalAlloc - m1.TotalAlloc
- allocPerItem := float64(allocDiff) / float64(numItems)
+ heapDiff := m2.HeapAlloc - m1.HeapAlloc
+ allocPerItem := float64(heapDiff) / float64(numItems)
@@
- t.Logf("  Total allocated: %d bytes", m2.TotalAlloc)
- t.Logf("  Allocation difference: %d bytes", allocDiff)
+ t.Logf("  Heap alloc: %d bytes", m2.HeapAlloc)
+ t.Logf("  Heap delta: %d bytes", heapDiff)

95-109: MemoryCleanup: compare heap-in-use, not cumulative allocation.

- allocDiff := m2.TotalAlloc - m1.TotalAlloc
+ heapDiff := m2.HeapAlloc - m1.HeapAlloc
@@
- t.Logf("  Total allocated: %d bytes", m2.TotalAlloc)
- t.Logf("  Allocation difference: %d bytes", allocDiff)
+ t.Logf("  Heap alloc: %d bytes", m2.HeapAlloc)
+ t.Logf("  Heap delta: %d bytes", heapDiff)
@@
- if allocDiff > 2*1024*1024 { // 2MB
-   t.Errorf("Expected less than 2MB memory growth after cleanup, got %d bytes", allocDiff)
+ if heapDiff > 2*1024*1024 { // 2MB
+   t.Errorf("Expected <2MB heap growth after cleanup, got %d bytes", heapDiff)

150-165: MemoryPressure: switch to HeapAlloc delta to avoid false positives.

- allocDiff := m2.TotalAlloc - m1.TotalAlloc
+ heapDiff := m2.HeapAlloc - m1.HeapAlloc
@@
- t.Logf("  Total allocated: %d bytes", m2.TotalAlloc)
- t.Logf("  Allocation difference: %d bytes", allocDiff)
+ t.Logf("  Heap alloc: %d bytes", m2.HeapAlloc)
+ t.Logf("  Heap delta: %d bytes", heapDiff)
@@
- if allocDiff > expectedMaxGrowth {
-   t.Errorf("Expected less than %d bytes memory growth under pressure, got %d bytes", expectedMaxGrowth, allocDiff)
+ if heapDiff > expectedMaxGrowth {
+   t.Errorf("Expected < %d bytes heap growth under pressure, got %d bytes", expectedMaxGrowth, heapDiff)

243-245: Handle Drain error during leak detection loop.

-        // Clear the cache
-        tempCache.Drain(ctx)
+        // Clear the cache
+        if _, err := tempCache.Drain(ctx); err != nil {
+            t.Errorf("Drain failed: %v", err)
+        }

internal/sms-gateway/online/service.go (1)

27-35: Name the logger for this module.

Create a child logger to keep logs consistently scoped.

 func New(devicesSvc *devices.Service, cache cache.Cache, logger *zap.Logger) Service {
   return &service{
     devicesSvc: devicesSvc,
-    cache:      cache,
-    logger:     logger,
+    cache:  cache,
+    logger: logger.Named("online"),
   }
 }

🧹 Nitpick comments (17)

pkg/cache/memory_profile_test.go (3)

191-198: GCStress: ErrKeyNotFound shouldn’t occur for existing keys (TTL=0). Verify and tighten.

These 100 keys were inserted and shouldn’t expire/evict; treat any error as a failure.

-           _, err := c.Get(ctx, key)
-           if err != nil && err != cache.ErrKeyNotFound {
-               t.Errorf("Get failed during GC stress test: %v", err)
-           }
+           if _, err := c.Get(ctx, key); err != nil {
+               t.Errorf("Get failed during GC stress test: %v", err)
+           }

If TTL=0 implies “no expiry” in pkg/cache/memory.go (expected), this is safe. If there’s eviction logic, please confirm and adjust test accordingly.

1-1: Scope errcheck suppression to benchmark; drop file-wide //nolint.

Keep lints active for tests; ignore only benchmark calls.

-//nolint:errcheck

-            // Set and get
-            cache.Set(ctx, key, value)
-            cache.Get(ctx, key)
+            // Set and get
+            cache.Set(ctx, key, value) //nolint:errcheck // benchmark: ignore errors for perf
+            cache.Get(ctx, key)        //nolint:errcheck // benchmark: ignore errors for perf
@@
-            // Delete
-            cache.Delete(ctx, key)
+            // Delete
+            cache.Delete(ctx, key)     //nolint:errcheck // benchmark: ignore errors for perf

Also applies to: 289-295

15-16: Nit: avoid shadowing the imported package name with a local variable.

Renaming improves readability and avoids confusion between package and value.

- cache := cache.NewMemory(0)
+ c := cache.NewMemory(0)
@@
- err := cache.Set(ctx, key, value)
+ err := c.Set(ctx, key, value)
@@
- _, err = cache.Get(ctx, key)
+ _, err = c.Get(ctx, key)

- cache := cache.NewMemory(0)
+ c := cache.NewMemory(0)
@@
- err := cache.Set(ctx, key, value)
+ err := c.Set(ctx, key, value)
@@
- _, err := cache.Drain(ctx)
+ _, err := c.Drain(ctx)

- cache := cache.NewMemory(0)
+ c := cache.NewMemory(0)

Also applies to: 29-36, 63-66, 79-86, 273-276

configs/config.example.yml (1)

27-28: Document Redis URL auth/TLS and multi-replica guidance

Add concrete Redis examples (auth, DB index, TLS) and a note that memory is per-process and unsuitable for multi-replica deployments.

 cache: # cache config
-  url: memory:// # cache url (memory:// or redis://) [CACHE__URL]
+  url: memory:// # cache url (memory:// or redis://) [CACHE__URL]
+  # Examples:
+  # - url: redis://localhost:6379?db=0
+  # - url: redis://:password@localhost:6379?db=0
+  # - url: rediss://redis.example.com:6380?db=0  # TLS-enabled
+  # Note: memory:// is in-process only; use Redis in multi-replica or horizontally scaled setups.

.github/workflows/go.yml (6)

31-35: Pin Go version via go.mod and enable module cache

go-version: stable can drift. Use the version in go.mod and cache modules for speed.

Apply here (repeat similarly in Test and Benchmark jobs):
-      - name: Set up Go
-        uses: actions/setup-go@v5
-        with:
-          go-version: stable
+      - name: Set up Go
+        uses: actions/setup-go@v5
+        with:
+          go-version-file: go.mod
+          cache: true
+          cache-dependency-path: go.sum
36-41: Avoid “latest” golangci-lint to reduce CI flakiness

Pin a specific linter version to avoid surprise breakages.

For example:
       - name: Run golangci-lint
         uses: golangci/golangci-lint-action@v8
         with:
-          version: latest
+          version: v1.60.3
           args: --timeout=5m
6-20: Ensure CI runs when the workflow changes

Edits to this file won’t trigger tests/linters because of the path filters. Include the workflow file itself.
       paths:
         - "**.go"
         - "go.mod"
         - "go.sum"
+        - ".github/workflows/go.yml"
Apply to both push and pull_request sections.

62-64: Improve test robustness and coverage accounting

Shuffle tests to catch order dependencies and include all packages in coverage.
-        run: go test -race -coverprofile=coverage.out -covermode=atomic ./...
+        run: go test -race -shuffle=on -covermode=atomic -coverpkg=./... -coverprofile=coverage.out ./...
Optional: add a preceding go vet ./... step.

79-87: Minor: ensure deps are pre-fetched for benchmarks

Warm modules to reduce variance and flakiness in benchmark timings.
       - name: Run benchmarks
-        run: go test -bench=. -benchmem ./... | tee benchmark.txt
+        run: |
+          go mod download
+          go test -bench=. -benchmem ./... | tee benchmark.txt
Also consider timeout-minutes: 15 on the job.

1-2: Optional: add concurrency to cancel superseded runs

Saves CI minutes on rapid pushes.
 name: Go

+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true

internal/sms-gateway/modules/devices/service.go (1)

103-111: Batch handler: honor context, log a summary, and count failures

Processing continues on canceled contexts and silently returns nil regardless of failures.

Apply:

 func (s *Service) SetLastSeen(ctx context.Context, batch map[string]time.Time) error {
-	for deviceId, lastSeen := range batch {
-		if err := s.devices.SetLastSeen(ctx, deviceId, lastSeen); err != nil {
-			s.logger.Error("can't set last seen", zap.String("device_id", deviceId), zap.Error(err))
-		}
-	}
-
-	return nil
+	if len(batch) == 0 {
+		return nil
+	}
+	var updated, failed int
+	for deviceId, lastSeen := range batch {
+		if ctx.Err() != nil {
+			break
+		}
+		if err := s.devices.SetLastSeen(ctx, deviceId, lastSeen); err != nil {
+			failed++
+			s.logger.Error("can't set last seen", zap.String("device_id", deviceId), zap.Error(err))
+			continue
+		}
+		updated++
+	}
+	s.logger.Debug("processed last_seen batch", zap.Int("updated", updated), zap.Int("failed", failed))
+	return ctx.Err()
 }

internal/sms-gateway/modules/auth/service.go (1)

147-150: Use defer cancel inside goroutine

Slightly safer and idiomatic; guarantees cancellation on early returns/panics.

Apply:

-	go func(id string) {
-		ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
-		s.onlineSvc.SetOnline(ctx, id)
-		cancel()
-	}(device.ID)
+	go func(id string) {
+		ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+		defer cancel()
+		s.onlineSvc.SetOnline(ctx, id)
+	}(device.ID)

internal/sms-gateway/online/service.go (5)

37-54: Reduce periodic log noise and flush on shutdown.

Downgrade the per-minute infos to debug and attempt a best-effort flush on exit.

 func (s *service) Run(ctx context.Context) {
   ticker := time.NewTicker(1 * time.Minute)
   defer ticker.Stop()
 
   for {
     select {
     case <-ctx.Done():
-      return
+      s.logger.Info("Shutting down: flushing online status")
+      flushCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+      defer cancel()
+      if err := s.persist(flushCtx); err != nil {
+        s.logger.Warn("Flush on shutdown failed", zap.Error(err))
+      }
+      return
     case <-ticker.C:
-      s.logger.Info("Persisting online status")
+      s.logger.Debug("Persisting online status")
       if err := s.persist(ctx); err != nil {
         s.logger.Error("Can't persist online status", zap.Error(err))
-      } else {
-        s.logger.Info("Online status persisted")
       }
     }
   }
 }

69-96: Short-circuit when nothing to persist.

Skip parsing/DB call when drain is empty and keep these logs at debug.

 func (s *service) persist(ctx context.Context) error {
   items, err := s.cache.Drain(ctx)
   if err != nil {
     return fmt.Errorf("can't drain cache: %w", err)
   }
 
-  s.logger.Info("Drained cache", zap.Int("count", len(items)))
+  if len(items) == 0 {
+    s.logger.Debug("No online statuses to persist")
+    return nil
+  }
+  s.logger.Debug("Drained cache", zap.Int("count", len(items)))
 
-  timestamps := maps.MapValues(items, func(v string) time.Time {
-    t, err := time.Parse(time.RFC3339, v)
-    if err != nil {
-      s.logger.Error("Can't parse last seen", zap.String("last_seen", v), zap.Error(err))
-      return time.Now().UTC()
-    }
-
-    return t
-  })
+  timestamps := make(map[string]time.Time, len(items))
+  for id, v := range items {
+    t, err := time.Parse(time.RFC3339, v)
+    if err != nil {
+      s.logger.Error("Can't parse last seen", zap.String("device_id", id), zap.String("last_seen", v), zap.Error(err))
+      t = time.Now().UTC()
+    }
+    timestamps[id] = t
+  }
 
-  s.logger.Info("Parsed last seen timestamps", zap.Int("count", len(timestamps)))
+  s.logger.Debug("Parsed last seen timestamps", zap.Int("count", len(timestamps)))
 
   if err := s.devicesSvc.SetLastSeen(ctx, timestamps); err != nil {
     return fmt.Errorf("can't set last seen: %w", err)
   }
 
   s.logger.Info("Set last seen")
 
   return nil
 }

37-54: Make interval configurable and add jitter.

Avoid minute-boundary thundering herd and allow tuning via config; add small random jitter to the ticker.

Also applies to: 69-96

89-91: Consider batching to limit write size and latency.

If timestamps can be large, chunk the map into fixed-size batches for SetLastSeen to reduce DB pressure and timeouts.

69-96: Add metrics for observability.

Expose counters/timers: drained_count, parse_errors, persist_duration, last_success_timestamp. Helps alerting and SLOs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c7fa7a and 8ad3f09.

📒 Files selected for processing (10)

.github/workflows/docker-publish.yml (0 hunks)
.github/workflows/go.yml (1 hunks)
.github/workflows/golangci-lint.yml (0 hunks)
configs/config.example.yml (1 hunks)
internal/sms-gateway/modules/auth/service.go (5 hunks)
internal/sms-gateway/modules/devices/repository.go (1 hunks)
internal/sms-gateway/modules/devices/service.go (1 hunks)
internal/sms-gateway/online/module.go (1 hunks)
internal/sms-gateway/online/service.go (1 hunks)
pkg/cache/memory_profile_test.go (1 hunks)

💤 Files with no reviewable changes (2)

.github/workflows/golangci-lint.yml
.github/workflows/docker-publish.yml

🚧 Files skipped from review as they are similar to previous changes (1)

internal/sms-gateway/online/module.go

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-10T11:11:43.452Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/memory.go:59-74
Timestamp: 2025-09-10T11:11:43.452Z
Learning: In pkg/cache/memory.go, the cleanup method removes expired items first, then executes the provided callback function. This means methods like Drain that use cleanup don't need additional expiration filtering since expired items are already removed before the callback runs.

Applied to files:

pkg/cache/memory_profile_test.go

🧬 Code graph analysis (4)

internal/sms-gateway/modules/devices/repository.go (2)

internal/sms-gateway/models/models.go (1)

Device (25-36)

internal/sms-gateway/modules/messages/repository.go (3)

r (141-164)

r (198-205)

r (166-191)

internal/sms-gateway/modules/auth/service.go (2)

internal/sms-gateway/modules/devices/service.go (1)

Service (29-38)

internal/sms-gateway/online/service.go (1)

Service (14-17)

internal/sms-gateway/online/service.go (3)

internal/sms-gateway/modules/auth/service.go (1)

Service (39-52)

internal/sms-gateway/modules/devices/service.go (1)

Service (29-38)

internal/sms-gateway/cache/factory.go (1)

Cache (11-11)

pkg/cache/memory_profile_test.go (2)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/errors.go (1)

ErrKeyNotFound (7-7)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Build / Docker image (linux/arm64)
GitHub Check: Benchmark
GitHub Check: Analyze (go)

🔇 Additional comments (8)

pkg/cache/memory_profile_test.go (1)

85-90: Nice: Drain error is handled.

Good catch adding error handling for Drain here.

configs/config.example.yml (2)

27-28: Defaulting/backcompat confirmed — Cache.URL defaults to "memory://"

Verified: internal/config/config.go sets the default Cache.URL = "memory://" (internal/config/config.go:104); the loader uses defaultConfig via config.LoadConfig(&defaultConfig) (internal/config/module.go:25); NewFactory also falls back to "memory://" when URL is empty (internal/sms-gateway/cache/factory.go:22–23). No change required.

27-28: Redact cache URL in logs — avoid leaking credentials

No repo code was found that logs the cache URL, but NewFactory passes the full URL into redis.New (internal/sms-gateway/cache/factory.go). Ensure neither our code nor the redis client logs the full URL; if logging is necessary, sanitize first (parse the URL and remove/u.User or replace credentials with "") or log only scheme+host (u.Scheme + "://" + u.Host).

Check internal/sms-gateway/cache/factory.go — avoid logging or passing an unhashed URL; sanitize before logging or before handing to clients that may log it.

Check configs/config.example.yml — do not include credentials in examples; document that credentials must be kept secret.

internal/sms-gateway/modules/devices/service.go (1)

103-111: No stale UpdateLastSeen usages found

rg found no matches for UpdateLastSeen; callers use SetLastSeen (e.g. internal/sms-gateway/online/service.go:89); implementation at internal/sms-gateway/modules/devices/service.go:103 and internal/sms-gateway/modules/devices/repository.go:74.

internal/sms-gateway/modules/auth/service.go (4)

14-14: Online service wiring import looks good

34-35: DI: added OnlineSvc to Params

47-48: Service field: onlineSvc injection

61-63: Constructor wires onlineSvc and logger

.github/workflows/go.yml

internal/sms-gateway/modules/devices/repository.go

internal/sms-gateway/online/service.go

pkg/cache/memory_profile_test.go

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal/sms-gateway/modules/devices/service.go (1)
131-135: Bug: token cache invalidation uses the wrong key; entry won’t be evicted.

GetByToken caches by sha256(token) hex key, but Remove deletes by raw device.AuthToken. This leaves stale cache entries.
- if err := s.tokensCache.Delete(device.AuthToken); err != nil {
+ hash := sha256.Sum256([]byte(device.AuthToken))
+ if err := s.tokensCache.Delete(hex.EncodeToString(hash[:])); err != nil {
   s.logger.Error("can't invalidate token cache", zap.Error(err))
 }

♻️ Duplicate comments (5)

pkg/cache/memory_profile_test.go (5)

25-25: Fix invalid ranges over integers (won’t compile).

Replace range-over-int with classic for-loops.

- for i := range numItems {
+ for i := 0; i < numItems; i++ {

- for i := range numItems {
+ for i := 0; i < numItems; i++ {

- for i := range numItems {
+ for i := 0; i < numItems; i++ {

- for range numGCs {
+ for i := 0; i < numGCs; i++ {

- for j := range 100 {
+ for j := 0; j < 100; j++ {

- for i := range numItems {
+ for i := 0; i < numItems; i++ {

- for i := range numCaches {
+ for i := 0; i < numCaches; i++ {

- for j := range 10 {
+ for j := 0; j < 10; j++ {

Also applies to: 75-75, 175-175, 187-187, 192-192, 202-202, 228-228, 233-233

45-55: Use HeapAlloc delta, not TotalAlloc, for per-item growth.

TotalAlloc is cumulative and misleads tests.

- allocDiff := m2.TotalAlloc - m1.TotalAlloc
- allocPerItem := float64(allocDiff) / float64(numItems)
+ heapDiff := m2.HeapAlloc - m1.HeapAlloc
+ perItem := float64(heapDiff) / float64(numItems)
@@
- t.Logf("  Total allocated: %d bytes", m2.TotalAlloc)
- t.Logf("  Allocation difference: %d bytes", allocDiff)
- t.Logf("  Allocations per item: %.2f bytes", allocPerItem)
+ t.Logf("  Heap alloc: %d bytes", m2.HeapAlloc)
+ t.Logf("  Heap delta: %d bytes", heapDiff)
+ t.Logf("  Heap delta per item: %.2f bytes", perItem)
@@
- if allocPerItem > 300 {
-   t.Errorf("Expected less than 300 bytes per item, got %.2f bytes", allocPerItem)
+ if perItem > 300 {
+   t.Errorf("Expected less than 300 bytes per item, got %.2f bytes", perItem)

Also applies to: 58-60

95-109: MemoryCleanup: compare heap-in-use, not cumulative allocation.

- allocDiff := m2.TotalAlloc - m1.TotalAlloc
+ heapDiff := m2.HeapAlloc - m1.HeapAlloc
@@
- t.Logf("  Total allocated: %d bytes", m2.TotalAlloc)
- t.Logf("  Allocation difference: %d bytes", allocDiff)
+ t.Logf("  Heap alloc: %d bytes", m2.HeapAlloc)
+ t.Logf("  Heap delta: %d bytes", heapDiff)
@@
- if allocDiff > 2*1024*1024 { // 2MB
-   t.Errorf("Expected less than 2MB memory growth after cleanup, got %d bytes", allocDiff)
+ if heapDiff > 2*1024*1024 { // 2MB
+   t.Errorf("Expected less than 2MB memory growth after cleanup, got %d bytes", heapDiff)

150-166: MemoryPressure: switch to HeapAlloc delta to avoid false positives.

- allocDiff := m2.TotalAlloc - m1.TotalAlloc
+ heapDiff := m2.HeapAlloc - m1.HeapAlloc
@@
- t.Logf("  Total allocated: %d bytes", m2.TotalAlloc)
- t.Logf("  Allocation difference: %d bytes", allocDiff)
+ t.Logf("  Heap alloc: %d bytes", m2.HeapAlloc)
+ t.Logf("  Heap delta: %d bytes", heapDiff)
@@
- if allocDiff > expectedMaxGrowth {
-   t.Errorf("Expected less than %d bytes memory growth under pressure, got %d bytes", expectedMaxGrowth, allocDiff)
+ if heapDiff > expectedMaxGrowth {
+   t.Errorf("Expected less than %d bytes memory growth under pressure, got %d bytes", expectedMaxGrowth, heapDiff)

243-245: Check Drain error.

Avoid silently ignoring failures.

- // Clear the cache
- tempCache.Drain(ctx)
+ // Clear the cache
+ if _, err := tempCache.Drain(ctx); err != nil {
+   t.Errorf("Drain failed: %v", err)
+ }

🧹 Nitpick comments (3)

internal/sms-gateway/modules/devices/service.go (1)
103-119: Don’t swallow per-item failures; aggregate and return them (and include more context in logs).

Right now, device-level errors are only logged and the function returns ctx.Err(), which is usually nil, masking failed updates. Aggregate per-item errors (and combine with ctx.Err()) and add last_seen to logs. Also prefer deviceID for Go initialism.
 func (s *Service) SetLastSeen(ctx context.Context, batch map[string]time.Time) error {
   if len(batch) == 0 {
     return nil
   }

-  for deviceId, lastSeen := range batch {
-    if ctx.Err() != nil {
-      break
-    }
-
-    if err := s.devices.SetLastSeen(ctx, deviceId, lastSeen); err != nil {
-      s.logger.Error("can't set last seen", zap.String("device_id", deviceId), zap.Error(err))
-    }
-  }
-
-  return ctx.Err()
+  var multiErr error
+  for deviceID, lastSeen := range batch {
+    if err := ctx.Err(); err != nil {
+      return errors.Join(err, multiErr)
+    }
+    if err := s.devices.SetLastSeen(ctx, deviceID, lastSeen); err != nil {
+      multiErr = errors.Join(multiErr, fmt.Errorf("device %s: %w", deviceID, err))
+      s.logger.Error("can't set last seen",
+        zap.String("device_id", deviceID),
+        zap.Time("last_seen", lastSeen),
+        zap.Error(err),
+      )
+    }
+  }
+  return multiErr
 }
Add missing import:
// add to imports
"errors"
If your intent is best-effort with no upstream retries, keep return nil on partial failures but at least count and emit a single warning with failure count. Want a bounded-concurrency variant (errgroup + worker pool) for large batches? I can provide one.
pkg/cache/memory_profile_test.go (2)
1-1: Scope errcheck suppression to benchmark calls; remove file-wide nolint.

Prefer local suppressions to avoid masking test issues.
-//nolint:errcheck
- cache.Set(ctx, key, value)
+ cache.Set(ctx, key, value) //nolint:errcheck
- cache.Get(ctx, key)
+ cache.Get(ctx, key) //nolint:errcheck
@@
- cache.Delete(ctx, key)
+ cache.Delete(ctx, key) //nolint:errcheck
Also applies to: 290-295

15-16: Nit: avoid shadowing the imported package name.

Rename local variable cache to c (applies similarly elsewhere) to reduce confusion.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5abcc8d and d1bd95b.

📒 Files selected for processing (7)

.github/workflows/go.yml (1 hunks)
internal/sms-gateway/cache/factory.go (1 hunks)
internal/sms-gateway/modules/auth/service.go (5 hunks)
internal/sms-gateway/modules/devices/repository.go (1 hunks)
internal/sms-gateway/modules/devices/service.go (1 hunks)
internal/sms-gateway/online/service.go (1 hunks)
pkg/cache/memory_profile_test.go (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

.github/workflows/go.yml
internal/sms-gateway/cache/factory.go
internal/sms-gateway/online/service.go
internal/sms-gateway/modules/auth/service.go
internal/sms-gateway/modules/devices/repository.go

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-10T11:11:43.452Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/memory.go:59-74
Timestamp: 2025-09-10T11:11:43.452Z
Learning: In pkg/cache/memory.go, the cleanup method removes expired items first, then executes the provided callback function. This means methods like Drain that use cleanup don't need additional expiration filtering since expired items are already removed before the callback runs.

Applied to files:

pkg/cache/memory_profile_test.go

🧬 Code graph analysis (1)

pkg/cache/memory_profile_test.go (1)

pkg/cache/memory.go (1)

NewMemory (16-23)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Build / Docker image (linux/arm64)
GitHub Check: Benchmark
GitHub Check: Analyze (go)

🔇 Additional comments (3)

internal/sms-gateway/modules/devices/service.go (1)

103-119: Resolved — SetLastSeen enforces monotonic updates

repository.SetLastSeen uses Where("id = ? AND last_seen < ?", id, lastSeen) before Update("last_seen", lastSeen) in internal/sms-gateway/modules/devices/repository.go, so older timestamps are ignored; no change required.

pkg/cache/memory_profile_test.go (2)

254-271: LGTM: correct leak check using HeapAlloc delta.

Good use of int64 to handle possible negative deltas and clear logging.

56-61: Thresholds validated — no CI flakiness observed

Ran the targeted memory tests 25×; allocations per item were consistently ~225.98 bytes (<300) and other memory metrics (allocation diffs, heap delta) remained stable; all targeted tests passed.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

internal/sms-gateway/modules/devices/service.go (2)
136-141: Evict-by-hash is good; enrich log context (and downgrade on cache-miss if applicable)

Add device/cache identifiers to the log to aid debugging. If Delete returns a “not found” sentinel, log at debug to avoid noise.

Apply this diff:
- if err := s.tokensCache.Delete(cacheKey); err != nil {
-   s.logger.Error("can't invalidate token cache", zap.Error(err))
- }
+ if err := s.tokensCache.Delete(cacheKey); err != nil {
+   // If the cache lib exposes a not-found error, consider logging at Debug here.
+   s.logger.Error("can't invalidate token cache",
+     zap.String("device_id", device.ID),
+     zap.String("cache_key", cacheKey),
+     zap.Error(err),
+   )
+ }
If you confirm the cache package returns a specific not-found error, I can propose a small guard to downgrade the log level accordingly.

104-124: Add empty-device-id guard and normalize lastSeen to UTC; preserve zero-time behavior

Add a guard for empty device IDs and call lastSeen = lastSeen.UTC() in internal/sms-gateway/modules/devices/service.go before invoking s.devices.SetLastSeen to avoid DB timezone mismatches.

Do not treat zero timestamps as errors — repository already ignores zero timestamps (internal/sms-gateway/modules/devices/repository.go:74–76).

Verified no UpdateLastSeen usages remain; SetLastSeen is called from internal/sms-gateway/online/service.go:91.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d1bd95b and 4e2d5e5.

📒 Files selected for processing (1)

internal/sms-gateway/modules/devices/service.go (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Benchmark
GitHub Check: E2E
GitHub Check: Analyze (go)

🔇 Additional comments (1)

internal/sms-gateway/modules/devices/service.go (1)

7-7: errors.Join import is appropriate here

Used correctly for multi-error accumulation; consistent with the Go 1.20+ toolchain bump.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (7)

.github/workflows/go.yml (2)
70-73: Codecov upload will fail on forked PRs.

PRs from forks don't have access to CODECOV_TOKEN, causing the upload step to fail. Either guard the step or remove the token for public repositories.

Apply this fix to guard the upload:
-      - name: Upload coverage to Codecov
-        uses: codecov/codecov-action@v5
-        with:
-          token: ${{ secrets.CODECOV_TOKEN }}
+      - name: Upload coverage to Codecov
+        if: ${{ secrets.CODECOV_TOKEN != '' }}
+        uses: codecov/codecov-action@v5
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}
+          fail_ci_if_error: false
100-110: Benchmark alerts ineffective without baseline persistence.

With auto-push: false, no historical data is persisted, making comparisons and alerts meaningless. Either enable persistence or disable alerting to avoid false failures.

Option A (recommended): Enable persistence by adding contents: write permission and setting auto-push: true:
   benchmark:
     name: Benchmark
     runs-on: ubuntu-latest
     permissions:
-      contents: read
+      contents: write
       pull-requests: write
-          fail-on-alert: true
+          auto-push: true
+          fail-on-alert: true
Option B: Disable alerting if persistence cannot be enabled:
-          fail-on-alert: true
+          fail-on-alert: false
+          comment-on-alert: false
pkg/cache/memory_bench_test.go (5)

1-13: File-level error suppression conflicts with past review feedback.

The file-level //nolint:errcheck directive suppresses error checking for all benchmarks, but past review comments specifically requested adding explicit error handling to benchmark loops to ensure failures don't go unnoticed.

15-28: Missing error handling in Set benchmark.

The benchmark ignores errors from cache.Set, which could lead to misleading performance results if operations fail silently.

30-46: Missing error handling for Get operations.

Similar to the Set benchmark, both the pre-population and the benchmark loop ignore errors from cache operations.

48-65: Missing error handling for SetAndGet operations.

The benchmark should handle errors from both Set and Get operations to ensure meaningful performance measurements.

67-80: SetOrFail benchmark needs proper error handling.

SetOrFail is expected to return ErrKeyExists on subsequent calls to the same key. The benchmark should handle this specific error case appropriately.

🧹 Nitpick comments (7)

internal/sms-gateway/modules/devices/repository.go (1)
74-87: Good fix: safe, monotonic last_seen update with proper GORM WHERE.

Guards zero timestamp, uses placeholders, and prevents regressions via last_seen < ?. Looks solid.

Normalize time precision/timezone to match DB TIMESTAMP(3).

Recommend UTC + millisecond truncation to avoid no‑ops from sub‑ms precision mismatches.

Apply:
 func (r *repository) SetLastSeen(ctx context.Context, id string, lastSeen time.Time) error {
   if lastSeen.IsZero() {
     return nil // ignore zero timestamps
   }
+  // Normalize to DB precision/timezone (TIMESTAMP(3) in UTC)
+  lastSeen = lastSeen.UTC().Truncate(time.Millisecond)

   res := r.db.WithContext(ctx).
     Model(&models.Device{}).
     Where("id = ? AND last_seen < ?", id, lastSeen).
-    Update("last_seen", lastSeen)
+    Update("last_seen", lastSeen)
   if res.Error != nil {
     return res.Error
   }
 
   // RowsAffected==0 => not found or stale timestamp; treat as no-op.
   return nil
 }
Confirm UpdatedAt semantics.

If bumping updated_at on last-seen writes is undesirable (affects removeUnused which keys off updated_at), switch to UpdateColumn to avoid auto-updating timestamps.

Option:
-    Update("last_seen", lastSeen)
+    UpdateColumn("last_seen", lastSeen)
pkg/cache/memory_edge_test.go (6)
242-247: Tighten the assertion; allowing ErrKeyNotFound hides a bug.

This test sets the key right before the Get (no TTL). Getting ErrKeyNotFound should fail the test.

Apply:
-            _, err := c.Get(ctx, key)
-            if err != nil && err != cache.ErrKeyNotFound {
-                t.Errorf("Get failed: %v", err)
-            }
+            _, err := c.Get(ctx, key)
+            if err != nil {
+                t.Errorf("Get failed: %v", err)
+            }
221-228: Gate timing-sensitive test in -short to reduce CI flakiness.

This test measures wall time; skip under -short.

Apply:
 func TestMemoryCache_RapidOperations(t *testing.T) {
   // Test rapid c operations
   c := cache.NewMemory(0)
   ctx := context.Background()
+  if testing.Short() {
+    t.Skip("skipping timing-sensitive test in -short mode")
+  }
47-56: Reduce timing flakiness for immediate-expiration test.

1 ms TTL with 2× sleep can flake on busy CI. Give a bit more slack.

Apply:
- ttl := 1 * time.Millisecond
+ ttl := 5 * time.Millisecond
 ...
- time.Sleep(2 * ttl)
+ time.Sleep(3 * ttl)
378-381: Guard against negative sleep and tighten jitter logic.

Defensive programming keeps the test stable if TTL changes later.

Apply:
-            jitter := time.Duration(id%3) * time.Millisecond
-            time.Sleep(ttl - 2*time.Millisecond + jitter)
+            jitter := time.Duration(id%3) * time.Millisecond
+            d := ttl - 2*time.Millisecond + jitter
+            if d < 0 {
+                d = 0
+            }
+            time.Sleep(d)
63-82: Nil context contract: confirm API guarantees.

You intentionally allow nil context (suppressed via nolint). Confirm the Cache interface docs commit to accepting nil; otherwise this test bakes in an undesirable API promise.

84-104: Empty key semantics: confirm cross-backend behavior.

If empty keys must be supported, ensure Redis and any future backends behave identically; if not, this test should be dropped or feature-gated.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd32085 and 1438a30.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (28)

.github/workflows/docker-publish.yml (0 hunks)
.github/workflows/go.yml (1 hunks)
.github/workflows/golangci-lint.yml (0 hunks)
build/package/Dockerfile (1 hunks)
build/package/Dockerfile.dev (1 hunks)
configs/config.example.yml (1 hunks)
go.mod (4 hunks)
internal/config/config.go (3 hunks)
internal/config/module.go (2 hunks)
internal/sms-gateway/app.go (4 hunks)
internal/sms-gateway/cache/config.go (1 hunks)
internal/sms-gateway/cache/factory.go (1 hunks)
internal/sms-gateway/cache/module.go (1 hunks)
internal/sms-gateway/modules/auth/service.go (5 hunks)
internal/sms-gateway/modules/devices/repository.go (1 hunks)
internal/sms-gateway/modules/devices/service.go (3 hunks)
internal/sms-gateway/online/module.go (1 hunks)
internal/sms-gateway/online/service.go (1 hunks)
pkg/cache/cache.go (1 hunks)
pkg/cache/errors.go (1 hunks)
pkg/cache/memory.go (1 hunks)
pkg/cache/memory_bench_test.go (1 hunks)
pkg/cache/memory_concurrency_test.go (1 hunks)
pkg/cache/memory_edge_test.go (1 hunks)
pkg/cache/memory_profile_test.go (1 hunks)
pkg/cache/memory_test.go (1 hunks)
pkg/cache/options.go (1 hunks)
pkg/cache/redis.go (1 hunks)

💤 Files with no reviewable changes (2)

.github/workflows/docker-publish.yml
.github/workflows/golangci-lint.yml

🚧 Files skipped from review as they are similar to previous changes (20)

internal/sms-gateway/cache/module.go
internal/sms-gateway/cache/config.go
pkg/cache/errors.go
pkg/cache/cache.go
internal/config/module.go
pkg/cache/options.go
build/package/Dockerfile.dev
configs/config.example.yml
build/package/Dockerfile
internal/sms-gateway/modules/devices/service.go
internal/sms-gateway/cache/factory.go
pkg/cache/redis.go
internal/sms-gateway/online/module.go
internal/sms-gateway/modules/auth/service.go
internal/config/config.go
pkg/cache/memory_profile_test.go
pkg/cache/memory_concurrency_test.go
pkg/cache/memory.go
internal/sms-gateway/app.go
internal/sms-gateway/online/service.go

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-09-10T23:53:42.006Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/redis.go:63-75
Timestamp: 2025-09-10T23:53:42.006Z
Learning: In pkg/cache/redis.go, the Redis cache implementation uses Redis's native field-level TTL via HExpireAt, meaning expired fields are automatically removed by Redis itself. This differs from the memory implementation where the application tracks expiration times. As a result, operations like Drain that call HGetAll will only return non-expired items since expired fields are already removed by Redis.

Applied to files:

pkg/cache/memory_edge_test.go

📚 Learning: 2025-09-10T11:11:43.462Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/memory.go:59-74
Timestamp: 2025-09-10T11:11:43.462Z
Learning: In pkg/cache/memory.go, the cleanup method removes expired items first, then executes the provided callback function. This means methods like Drain that use cleanup don't need additional expiration filtering since expired items are already removed before the callback runs.

Applied to files:

pkg/cache/memory_edge_test.go
pkg/cache/memory_bench_test.go

🧬 Code graph analysis (4)

internal/sms-gateway/modules/devices/repository.go (1)

internal/sms-gateway/models/models.go (1)

Device (25-36)

pkg/cache/memory_edge_test.go (3)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/options.go (1)

WithTTL (22-30)

pkg/cache/errors.go (2)

ErrKeyExpired (9-9)

ErrKeyNotFound (7-7)

pkg/cache/memory_test.go (3)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/options.go (2)

WithTTL (22-30)

WithValidUntil (34-38)

pkg/cache/errors.go (2)

ErrKeyNotFound (7-7)

ErrKeyExists (11-11)

pkg/cache/memory_bench_test.go (2)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/options.go (1)

WithTTL (22-30)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: E2E
GitHub Check: Benchmark
GitHub Check: Analyze (go)

🔇 Additional comments (11)

.github/workflows/go.yml (1)

78-80: Benchmark permissions are correctly configured.

The job-level permissions correctly grant pull-requests: write permission for posting benchmark comments while maintaining read-only access for other operations. This addresses the workflow-wide contents: read limitation.

pkg/cache/memory_test.go (4)

1-9: Clean test package organization.

The package structure and imports are well organized for comprehensive cache testing.

11-33: LGTM for basic Set/Get functionality.

The test correctly verifies basic cache operations with appropriate error handling and value validation.

324-358: Good TTL expiration and cleanup validation.

This test properly exercises the cleanup functionality by setting items with short TTL, waiting for expiration, and verifying cleanup removes expired items.

469-491: Appropriate large value testing.

Testing with 1MB values helps validate memory handling and ensures the cache can handle reasonably large payloads without issues.

pkg/cache/memory_bench_test.go (3)

224-263: Comprehensive mixed workload benchmark design.

The benchmark design with different read/write ratios and randomized access patterns provides valuable performance insights for realistic usage scenarios.

304-334: Good TTL overhead measurement approach.

Comparing performance with and without TTL provides valuable insights into the cost of TTL features. The benchmark design is sound.

396-418: Excellent random access pattern benchmark.

Pre-populating with 1000 keys and testing random access patterns provides realistic performance data for cache lookup scenarios.
go.mod (1)
3-3: Pin Go 1.24.1 in CI/Docker and vet new deps.

No CI workflows or Dockerfiles pin Go (search found no actions/setup-go or FROM golang:); update GitHub Actions, Dockerfiles, and dev toolchain to Go 1.24.1.

go.mod shows go 1.24.1 (line 3) and new/updated deps — validator v10.26.0 (ln 13), redis v9.9.0 (21), fx v1.24.0 (23), zap v1.27.0 (24); indirect: go-rendezvous, mimetype (45–46). Run full build + vuln checks.

Run: go mod tidy && go build ./... && go vet ./... && govulncheck ./...

Optional (reproducibility): add toolchain pin:
 go 1.24.1
+toolchain go1.24.1
pkg/cache/memory_edge_test.go (2)
309-327: Drain semantics assertion looks correct (memory backend).

Per retrieved learnings, memory.Drain cleans expired items first, so expecting ErrKeyNotFound post-Drain is right. Consider mirroring this as an integration test for the Redis backend to ensure parity since Redis removes expired fields natively via HExpireAt.

Would you like a follow-up test template for the Redis cache variant?

1-12: Approve — great coverage; run flakiness + race locally (sandbox couldn't run -race)

Solid edge-case matrix for TTL, drain, and concurrency. Nice use of package cache_test to assert public API.

Sandbox test run failed with environment errors:

go: -race requires cgo; enable cgo by setting CGO_ENABLED=1

failed to start telemetry sidecar: os.Executable: readlink /proc/self/exe: no such file or directory

Run locally or in CI with one of these and attach output if flakes/failures appear:
# Run race locally with CGO enabled
CGO_ENABLED=1 go test ./pkg/cache -run 'TestMemoryCache_' -race -count=100 -timeout=5m -v

# If CGO cannot be enabled in your environment, run flakiness sweep without -race
go test ./pkg/cache -run 'TestMemoryCache_' -count=100 -timeout=5m -v

pkg/cache/memory_edge_test.go

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal/sms-gateway/modules/devices/repository.go (1)
70-72: Fix invalid GORM WHERE usage — use a parameterized predicate

Where("id", id) is not a valid predicate and can cause a full-table update or SQL error.
Location: internal/sms-gateway/modules/devices/repository.go:71

Apply:
- return r.db.Model(&models.Device{}).Where("id", id).Update("push_token", token).Error
+ return r.db.Model(&models.Device{}).Where("id = ?", id).Update("push_token", token).Error

♻️ Duplicate comments (1)

internal/sms-gateway/modules/devices/repository.go (1)

74-87: Monotonic last_seen update + proper WHERE — resolved

Good fix: guards zero time, uses id = ? AND last_seen < ?, and UpdateColumn with context. This addresses the earlier feedback to prevent regressions and to use placeholders.

🧹 Nitpick comments (2)

internal/sms-gateway/modules/devices/repository.go (2)
74-87: Normalize to UTC to avoid TZ comparison edge cases

If callers pass non-UTC times, DB/session TZ differences can cause subtle misorders. Normalize before updating.
 func (r *repository) SetLastSeen(ctx context.Context, id string, lastSeen time.Time) error {
   if lastSeen.IsZero() {
     return nil // ignore zero timestamps
   }
+  lastSeen = lastSeen.UTC()
   res := r.db.WithContext(ctx).
     Model(&models.Device{}).
     Where("id = ? AND last_seen < ?", id, lastSeen).
     UpdateColumn("last_seen", lastSeen)
Confirm the column type is TIMESTAMP/TIMESTAMPTZ with consistent TZ handling in your DB.

99-106: Pruning by last_seen — confirm policy and index for performance

Switching to last_seen changes retention semantics (vs. updated_at). Confirm this aligns with product policy. Also ensure an index on last_seen to avoid deletes scanning the whole table.

If not already indexed, consider adding a GORM index tag/migration for devices.last_seen.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1438a30 and ca5d2d5.

📒 Files selected for processing (2)

internal/sms-gateway/modules/devices/repository.go (2 hunks)
pkg/cache/memory_edge_test.go (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/cache/memory_edge_test.go

🧰 Additional context used

🧬 Code graph analysis (1)

internal/sms-gateway/modules/devices/repository.go (1)

internal/sms-gateway/models/models.go (1)

Device (25-36)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Build / Docker image (linux/amd64)
GitHub Check: Build / Docker image (linux/arm64)
GitHub Check: Benchmark
GitHub Check: Analyze (go)

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal/sms-gateway/modules/devices/repository.go (1)

99-106: Add an index on last_seen and switch removeUnused to batched deletes

No index found on LastSeen (internal/sms-gateway/models/models.go) and the migration only adds the column (internal/sms-gateway/models/migrations/mysql/20240214195939_device_last_seen.sql) — deleting by last_seen can cause full-table scans and long locks. Actions:

Add an index for last_seen (e.g., new migration: CREATE INDEX idx_devices_last_seen ON devices(last_seen) or add gorm:"index:idx_devices_last_seen" to LastSeen and add a migration).

Change removeUnused to delete in small batches (loop DELETE ... LIMIT or delete by PK chunks) to avoid long locks; implement in internal/sms-gateway/modules/devices/repository.go (removeUnused).

♻️ Duplicate comments (6)

pkg/cache/memory_profile_test.go (3)

46-53: Use HeapAlloc deltas; TotalAlloc is cumulative and misleading.

Switch to HeapAlloc for growth checks/logs to avoid false positives.

-allocDiff := m2.TotalAlloc - m1.TotalAlloc
-allocPerItem := float64(allocDiff) / float64(numItems)
- t.Logf("  Total allocated: %d bytes", m2.TotalAlloc)
- t.Logf("  Allocation difference: %d bytes", allocDiff)
+heapDiff := m2.HeapAlloc - m1.HeapAlloc
+allocPerItem := float64(heapDiff) / float64(numItems)
+ t.Logf("  Heap alloc: %d bytes", m2.HeapAlloc)
+ t.Logf("  Heap delta: %d bytes", heapDiff)

-allocDiff := m2.TotalAlloc - m1.TotalAlloc
+heapDiff := m2.HeapAlloc - m1.HeapAlloc
@@
- t.Logf("  Total allocated: %d bytes", m2.TotalAlloc)
- t.Logf("  Allocation difference: %d bytes", allocDiff)
+ t.Logf("  Heap alloc: %d bytes", m2.HeapAlloc)
+ t.Logf("  Heap delta: %d bytes", heapDiff)
@@
- if allocDiff > 2*1024*1024 {
-   t.Errorf("Expected less than 2MB memory growth after cleanup, got %d bytes", allocDiff)
+ if heapDiff > 2*1024*1024 {
+   t.Errorf("Expected less than 2MB memory growth after cleanup, got %d bytes", heapDiff)

-allocDiff := m2.TotalAlloc - m1.TotalAlloc
+heapDiff := m2.HeapAlloc - m1.HeapAlloc
@@
- t.Logf("  Total allocated: %d bytes", m2.TotalAlloc)
- t.Logf("  Allocation difference: %d bytes", allocDiff)
+ t.Logf("  Heap alloc: %d bytes", m2.HeapAlloc)
+ t.Logf("  Heap delta: %d bytes", heapDiff)
@@
- if allocDiff > expectedMaxGrowth {
-   t.Errorf("Expected less than %d bytes memory growth under pressure, got %d bytes", expectedMaxGrowth, allocDiff)
+ if heapDiff > expectedMaxGrowth {
+   t.Errorf("Expected less than %d bytes memory growth under pressure, got %d bytes", expectedMaxGrowth, heapDiff)

-b.Logf("TotalAlloc per op: %.2f bytes/op", float64(m2.TotalAlloc-m1.TotalAlloc)/float64(b.N))
+b.Logf("HeapAlloc per op: %.2f bytes/op", float64(m2.HeapAlloc-m1.HeapAlloc)/float64(b.N))

Also applies to: 96-103, 151-157, 300-301

244-244: Handle Drain error.

Don’t ignore the error from Drain.

- tempCache.Drain(ctx)
+ if _, err := tempCache.Drain(ctx); err != nil {
+   t.Errorf("Drain failed: %v", err)
+ }

25-25: Fix: invalid range over integers (compile errors).

Replace all “range” loops over ints with classic for-loops.

- for i := range numItems {
+ for i := 0; i < numItems; i++ {
 }
- for i := range numItems {
+ for i := 0; i < numItems; i++ {
 }
- for i := range numItems {
+ for i := 0; i < numItems; i++ {
 }
- for range numGCs {
+ for i := 0; i < numGCs; i++ {
 }
- for j := range 100 {
+ for j := 0; j < 100; j++ {
 }
- for i := range numItems {
+ for i := 0; i < numItems; i++ {
 }
- for i := range numCaches {
+ for i := 0; i < numCaches; i++ {
 }
- for j := range 10 {
+ for j := 0; j < 10; j++ {
 }

Also applies to: 75-75, 175-175, 187-187, 192-192, 202-202, 228-228, 233-233

pkg/cache/redis.go (1)

127-151: Missing field in HExpireAt call (TTL not applied).

SetOrFail doesn’t pass the field to HExpireAt, so the TTL isn’t set.

Apply:
-	if !options.validUntil.IsZero() {
-		if err := r.client.HExpireAt(ctx, r.key, options.validUntil).Err(); err != nil {
+	if !options.validUntil.IsZero() {
+		if err := r.client.HExpireAt(ctx, r.key, key, options.validUntil).Err(); err != nil {
 			return fmt.Errorf("can't set cache item ttl: %w", err)
 		}
 	}
Also verify all other HExpire* usages repo-wide:
#!/bin/bash
rg -nP 'HExpire(?:At|)\s*\(' -C2 --type=go

internal/sms-gateway/online/service.go (2)

27-35: Name the logger for this service.

Scope logs under a stable component name.

Apply:

 func New(devicesSvc *devices.Service, cache cache.Cache, logger *zap.Logger) Service {
 	return &service{
 		devicesSvc: devicesSvc,
 		cache:      cache,
-		logger:     logger,
+		logger:     logger.Named("online"),
 	}
 }

54-65: Validate deviceID and reduce log verbosity.

Prevent empty-key writes and avoid per-call info‑level noise.

Apply:

 func (s *service) SetOnline(ctx context.Context, deviceID string) {
 	dt := time.Now().UTC().Format(time.RFC3339)

-	s.logger.Info("Setting online status", zap.String("device_id", deviceID), zap.String("last_seen", dt))
+	if strings.TrimSpace(deviceID) == "" {
+		s.logger.Warn("Empty device_id; skipping SetOnline")
+		return
+	}
+	s.logger.Debug("Setting online status", zap.String("device_id", deviceID), zap.String("last_seen", dt))

 	if err := s.cache.Set(ctx, deviceID, dt); err != nil {
 		s.logger.Error("Can't set online status", zap.String("device_id", deviceID), zap.Error(err))
 		return
 	}

-	s.logger.Info("Online status set", zap.String("device_id", deviceID))
+	s.logger.Debug("Online status set", zap.String("device_id", deviceID))
 }

Add missing import:

 import (
 	"context"
 	"fmt"
+	"strings"
 	"time"

🧹 Nitpick comments (11)

pkg/cache/memory.go (3)
16-23: Redundant mutex initialization.

sync.RWMutex has a valid zero value; remove the explicit init.
 func NewMemory(ttl time.Duration) Cache {
 	return &memoryCache{
 		items: make(map[string]*memoryItem),
 		ttl:   ttl,
-
-		mux: sync.RWMutex{},
 	}
 }
157-169: Callback runs under write lock—document constraint.

cb executes while holding m.mux; ensure it stays O(1) (as in Drain) to avoid blocking writers/readers. Consider a short comment to prevent future misuse.

77-85: Optional: lazily delete expired on Get.

You could delete-on-read when expired to reduce stale entries, though it trades a read lock for a write lock. Fine to keep as-is if Cleanup runs periodically.
pkg/cache/memory_profile_test.go (1)

15-15: Avoid shadowing the imported cache package name.

Rename local vars (e.g., c := cache.NewMemory(0)) to reduce confusion.

Also applies to: 65-65, 275-276
pkg/cache/redis.go (2)
49-52: Cleanup no-op: document behavior.

Given Redis handles per-field TTL removal, this is fine—add a comment to clarify that Cleanup is intentionally a no-op on Redis.

35-47: Add a startup capability check for HEXPIRE (defensive).

Fail fast if the Redis server doesn’t support hash-field TTL to avoid runtime “unknown command” errors.

Example helper (place in this file) and call from NewRedis:
func ensureHashFieldTTL(ctx context.Context, c *redis.Client) error {
	cmd := c.Do(ctx, "COMMAND", "INFO", "HEXPIRE")
	if err := cmd.Err(); err != nil {
		return err
	}
	if v, _ := cmd.Result().([]interface{}); len(v) == 0 {
		return fmt.Errorf("Redis server lacks HEXPIRE; requires Redis 7.4+")
	}
	return nil
}
Then in NewRedis, before return:
_ = ensureHashFieldTTL(context.Background(), client) // log/handle as desired
internal/sms-gateway/online/service.go (4)
37-52: Flush on startup and on shutdown (bounded).

Avoid losing recent writes on process stop; also persist immediately after start.

Apply:
 func (s *service) Run(ctx context.Context) {
 	ticker := time.NewTicker(1 * time.Minute)
 	defer ticker.Stop()

+	// eager flush on start
+	if err := s.persist(ctx); err != nil {
+		s.logger.Error("Can't persist online status (startup)", zap.Error(err))
+	}
+
 	for {
 		select {
 		case <-ctx.Done():
-			return
+			// best-effort final flush with timeout
+			flushCtx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
+			if err := s.persist(flushCtx); err != nil {
+				s.logger.Error("Can't persist online status (shutdown)", zap.Error(err))
+			}
+			cancel()
+			return
 		case <-ticker.C:
 			s.logger.Debug("Persisting online status")
 			if err := s.persist(ctx); err != nil {
 				s.logger.Error("Can't persist online status", zap.Error(err))
 			}
 		}
 	}
 }
79-87: Don’t advance timestamps on parse errors; skip invalid entries.

Advancing to now() can falsely move last_seen forward. Safer to drop invalids and log.

Apply:
-	timestamps := maps.MapValues(items, func(v string) time.Time {
-		t, err := time.Parse(time.RFC3339, v)
-		if err != nil {
-			s.logger.Error("Can't parse last seen", zap.String("last_seen", v), zap.Error(err))
-			return time.Now().UTC()
-		}
-
-		return t
-	})
+	timestamps := make(map[string]time.Time, len(items))
+	invalid := 0
+	for id, v := range items {
+		t, err := time.Parse(time.RFC3339, v)
+		if err != nil {
+			invalid++
+			s.logger.Warn("Skipping invalid last_seen", zap.String("device_id", id), zap.String("last_seen", v), zap.Error(err))
+			continue
+		}
+		timestamps[id] = t
+	}
 
-	s.logger.Debug("Parsed last seen timestamps", zap.Int("count", len(timestamps)))
+	s.logger.Debug("Parsed last seen timestamps", zap.Int("count", len(timestamps)), zap.Int("invalid", invalid))
And remove the unused helper import:
-	"github.com/capcom6/go-helpers/maps"
Also applies to: 89-89

95-97: Lower success log to debug and include count.

Keep logs quiet on healthy paths and add useful signal.

Apply:
-	s.logger.Info("Set last seen")
+	s.logger.Debug("Set last seen", zap.Int("count", len(timestamps)))
19-25: Depend on a narrow interface, not concrete service.

Improves testability and reduces coupling.

Define locally:
type lastSeenWriter interface {
	SetLastSeen(ctx context.Context, batch map[string]time.Time) error
}
Then change:
-	devicesSvc *devices.Service
+	devicesSvc lastSeenWriter
internal/sms-gateway/modules/devices/repository.go (1)
74-88: Forward‑only update is correct; add cheap guards.

Consider early‑return on empty id to avoid full‑table scans and keep using UpdateColumn to skip updated_at.

Apply:
 func (r *repository) SetLastSeen(ctx context.Context, id string, lastSeen time.Time) error {
-	if lastSeen.IsZero() {
+	if id == "" || lastSeen.IsZero() {
 		return nil // ignore zero timestamps
 	}
 	res := r.db.WithContext(ctx).
 		Model(&models.Device{}).
 		Where("id = ? AND last_seen < ?", id, lastSeen).
 		UpdateColumn("last_seen", lastSeen)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca5d2d5 and ba4ce53.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (28)

.github/workflows/docker-publish.yml (0 hunks)
.github/workflows/go.yml (1 hunks)
.github/workflows/golangci-lint.yml (0 hunks)
build/package/Dockerfile (1 hunks)
build/package/Dockerfile.dev (1 hunks)
configs/config.example.yml (1 hunks)
go.mod (4 hunks)
internal/config/config.go (3 hunks)
internal/config/module.go (2 hunks)
internal/sms-gateway/app.go (4 hunks)
internal/sms-gateway/cache/config.go (1 hunks)
internal/sms-gateway/cache/factory.go (1 hunks)
internal/sms-gateway/cache/module.go (1 hunks)
internal/sms-gateway/modules/auth/service.go (5 hunks)
internal/sms-gateway/modules/devices/repository.go (2 hunks)
internal/sms-gateway/modules/devices/service.go (3 hunks)
internal/sms-gateway/online/module.go (1 hunks)
internal/sms-gateway/online/service.go (1 hunks)
pkg/cache/cache.go (1 hunks)
pkg/cache/errors.go (1 hunks)
pkg/cache/memory.go (1 hunks)
pkg/cache/memory_bench_test.go (1 hunks)
pkg/cache/memory_concurrency_test.go (1 hunks)
pkg/cache/memory_edge_test.go (1 hunks)
pkg/cache/memory_profile_test.go (1 hunks)
pkg/cache/memory_test.go (1 hunks)
pkg/cache/options.go (1 hunks)
pkg/cache/redis.go (1 hunks)

💤 Files with no reviewable changes (2)

.github/workflows/golangci-lint.yml
.github/workflows/docker-publish.yml

🚧 Files skipped from review as they are similar to previous changes (20)

internal/config/config.go
internal/sms-gateway/cache/config.go
build/package/Dockerfile
internal/config/module.go
internal/sms-gateway/cache/factory.go
pkg/cache/errors.go
internal/sms-gateway/app.go
internal/sms-gateway/cache/module.go
internal/sms-gateway/online/module.go
pkg/cache/memory_test.go
.github/workflows/go.yml
configs/config.example.yml
pkg/cache/memory_edge_test.go
pkg/cache/memory_concurrency_test.go
build/package/Dockerfile.dev
internal/sms-gateway/modules/auth/service.go
pkg/cache/memory_bench_test.go
pkg/cache/options.go
pkg/cache/cache.go
internal/sms-gateway/modules/devices/service.go

🧰 Additional context used

🧠 Learnings (3)

📚 Learning: 2025-09-10T11:11:43.462Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/memory.go:59-74
Timestamp: 2025-09-10T11:11:43.462Z
Learning: In pkg/cache/memory.go, the cleanup method removes expired items first, then executes the provided callback function. This means methods like Drain that use cleanup don't need additional expiration filtering since expired items are already removed before the callback runs.

Applied to files:

pkg/cache/memory.go
pkg/cache/redis.go
pkg/cache/memory_profile_test.go

📚 Learning: 2025-09-10T23:53:42.006Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/redis.go:63-75
Timestamp: 2025-09-10T23:53:42.006Z
Learning: In pkg/cache/redis.go, the Redis cache implementation uses Redis's native field-level TTL via HExpireAt, meaning expired fields are automatically removed by Redis itself. This differs from the memory implementation where the application tracks expiration times. As a result, operations like Drain that call HGetAll will only return non-expired items since expired fields are already removed by Redis.

Applied to files:

pkg/cache/memory.go
pkg/cache/redis.go

📚 Learning: 2025-09-10T23:54:16.828Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/redis.go:118-121
Timestamp: 2025-09-10T23:54:16.828Z
Learning: Redis 7.4.0 introduced HEXPIRE, HEXPIREAT, and related commands that support per-field TTL on hash data structures. go-redis v9.6.0+ includes support for these commands with methods like HExpireAt.

Applied to files:

pkg/cache/redis.go

🧬 Code graph analysis (5)

internal/sms-gateway/online/service.go (3)

internal/sms-gateway/modules/devices/service.go (1)

Service (30-39)

internal/sms-gateway/modules/auth/service.go (2)

Service (39-52)

New (54-68)

internal/sms-gateway/cache/factory.go (1)

Cache (15-15)

pkg/cache/memory.go (3)

pkg/cache/cache.go (1)

Cache (5-36)

pkg/cache/options.go (1)

Option (6-6)

pkg/cache/errors.go (3)

ErrKeyExists (11-11)

ErrKeyNotFound (7-7)

ErrKeyExpired (9-9)

pkg/cache/redis.go (3)

pkg/cache/cache.go (1)

Cache (5-36)

pkg/cache/errors.go (2)

ErrKeyNotFound (7-7)

ErrKeyExists (11-11)

pkg/cache/options.go (1)

Option (6-6)

pkg/cache/memory_profile_test.go (1)

pkg/cache/memory.go (1)

NewMemory (16-23)

internal/sms-gateway/modules/devices/repository.go (1)

internal/sms-gateway/models/models.go (1)

Device (25-36)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Build / Docker image (linux/arm64)
GitHub Check: Build / Docker image (linux/amd64)
GitHub Check: Benchmark
GitHub Check: Analyze (go)

🔇 Additional comments (6)

pkg/cache/memory.go (1)

59-74: Drain logic looks correct and atomic.

Expired items are purged before swapping maps; snapshot is built outside the lock. LGTM.

pkg/cache/redis.go (2)

93-105: Atomic GetAndDelete via Lua looks good.

106-125: HExpireAt call is correct — no change required.
go-redis v9 signature is HExpireAt(ctx, key string, tm time.Time, fields ...string), so r.client.HExpireAt(ctx, r.key, options.validUntil, key) uses the correct argument order.

Likely an incorrect or invalid review comment.

go.mod (2)

21-21: HEXPIRE/HEXPIREAT require Redis ≥ 7.4 — confirm server + client usage

go-redis v9 HExpireAt signature: HExpireAt(ctx context.Context, key string, tm time.Time, fields ...string) — pass expiry time before the field names.

HEXPIRE/HEXPIREAT introduced in Redis Open Source 7.4.0 — verify production Redis is >= 7.4.0.

Add a startup capability check (COMMAND INFO HEXPIRE / HEXPIREAT) and fail fast if unsupported.

3-3: Pin CI and container images to Go 1.24.1.

go.mod already contains "go 1.24.1" — update CI and container references to match:

.github/workflows/go.yml — actions/setup-go: go-version: stable -> 1.24.1

.github/workflows/docker-publish.yml — actions/setup-go: go-version: stable -> 1.24.1

build/package/Dockerfile and build/package/Dockerfile.dev — FROM golang:1.24-alpine -> FROM golang:1.24.1-alpine (or FROM golang:1.24.1)
Confirm any developer docs mention the same version.

internal/sms-gateway/modules/devices/repository.go (1)

71-71: Good fix: parameterized WHERE.

Prevents invalid GORM usage and ensures correct binding.

internal/sms-gateway/online/service.go

pkg/cache/memory.go

pkg/cache/redis.go

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

internal/sms-gateway/online/service.go (3)

27-35: Name the logger for this module.

Scope logs with a child logger to aid filtering.

 func New(devicesSvc *devices.Service, cache cache.Cache, logger *zap.Logger) Service {
   return &service{
     devicesSvc: devicesSvc,
     cache:      cache,
-    logger:     logger,
+    logger:     logger.Named("online"),
   }
 }

54-65: Validate deviceID and reduce log verbosity.

Avoid empty-key writes; SetOnline is high‑QPS, so use Debug not Info.

 func (s *service) SetOnline(ctx context.Context, deviceID string) {
-  dt := time.Now().UTC().Format(time.RFC3339)
-
-  s.logger.Info("Setting online status", zap.String("device_id", deviceID), zap.String("last_seen", dt))
-
-  if err := s.cache.Set(ctx, deviceID, dt); err != nil {
-    s.logger.Error("Can't set online status", zap.String("device_id", deviceID), zap.Error(err))
+  id := strings.TrimSpace(deviceID)
+  if id == "" {
+    s.logger.Warn("Empty device_id; skipping SetOnline")
+    return
+  }
+  dt := time.Now().UTC().Format(time.RFC3339)
+
+  s.logger.Debug("Setting online status", zap.String("device_id", id), zap.String("last_seen", dt))
+
+  if err := s.cache.Set(ctx, id, dt); err != nil {
+    s.logger.Error("Can't set online status", zap.String("device_id", id), zap.Error(err))
     return
   }
 
-  s.logger.Info("Online status set", zap.String("device_id", deviceID))
+  s.logger.Debug("Online status set", zap.String("device_id", id))
 }

Additional change required outside this hunk:

// add to imports
import "strings"

59-60: Ensure a non-zero TTL is applied to online entries.

Without TTL, keys can persist indefinitely if persist stalls. Verify factory wiring or apply per‑write TTL.

#!/bin/bash
# Verify TTL for the "online" cache and any WithTTL usage.
rg -n --type=go 'factory\.New\("online"\)|WithTTL\(|DefaultTTL|TTL' \
  internal/sms-gateway internal/config pkg/cache || true

# Inspect memory/redis cache constructors for ttl args.
rg -n --type=go 'New(Memory|Redis)\(' pkg/cache || true

🧹 Nitpick comments (5)

internal/sms-gateway/online/service.go (5)

37-52: Flush once on start and on shutdown to avoid dropping buffered updates.

Persist immediately after start and do a best‑effort flush on ctx cancel.

 func (s *service) Run(ctx context.Context) {
   ticker := time.NewTicker(1 * time.Minute)
   defer ticker.Stop()
 
+  // best-effort initial persist
+  if err := s.persist(ctx); err != nil {
+    s.logger.Warn("Initial persist failed", zap.Error(err))
+  }
+
   for {
     select {
-    case <-ctx.Done():
-      return
+    case <-ctx.Done():
+      // best-effort final persist
+      flushCtx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
+      defer cancel()
+      if err := s.persist(flushCtx); err != nil {
+        s.logger.Warn("Final persist failed", zap.Error(err))
+      }
+      return
     case <-ticker.C:
       s.logger.Debug("Persisting online status")
       if err := s.persist(ctx); err != nil {
         s.logger.Error("Can't persist online status", zap.Error(err))
       }
     }
   }
 }

38-38: Make the persist interval configurable (with sane default).

Improves ops flexibility; reduces need for code changes.

55-55: Prefer RFC3339Nano for higher precision (optional).

- dt := time.Now().UTC().Format(time.RFC3339)
+ dt := time.Now().UTC().Format(time.RFC3339Nano)

79-87: Include device_id in parse errors and avoid per-item time.Now().

Refactor to a simple loop; log the offending device; compute now once.

- timestamps := maps.MapValues(items, func(v string) time.Time {
-   t, err := time.Parse(time.RFC3339, v)
-   if err != nil {
-     s.logger.Error("Can't parse last seen", zap.String("last_seen", v), zap.Error(err))
-     return time.Now().UTC()
-   }
-
-   return t
- })
+ now := time.Now().UTC()
+ timestamps := make(map[string]time.Time, len(items))
+ for id, v := range items {
+   t, err := time.Parse(time.RFC3339, v)
+   if err != nil {
+     s.logger.Warn("Invalid last_seen in cache; using fallback 'now'",
+       zap.String("device_id", id), zap.String("last_seen", v), zap.Error(err))
+     timestamps[id] = now
+     continue
+   }
+   timestamps[id] = t
+ }

Additional change required outside this hunk:

// remove unused helper import
// - "github.com/capcom6/go-helpers/maps"

19-25: *Depend on a narrow interface, not devices.Service.

For testability and decoupling, inject an interface with just SetLastSeen(ctx, map[string]time.Time) error.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba4ce53 and d5f3c2c.

📒 Files selected for processing (3)

internal/sms-gateway/online/service.go (1 hunks)
pkg/cache/memory.go (1 hunks)
pkg/cache/redis.go (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

pkg/cache/memory.go
pkg/cache/redis.go

🧰 Additional context used

🧬 Code graph analysis (1)

internal/sms-gateway/online/service.go (2)

internal/sms-gateway/modules/devices/service.go (1)

Service (30-39)

internal/sms-gateway/cache/factory.go (1)

Cache (15-15)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: E2E
GitHub Check: Benchmark
GitHub Check: Analyze (go)

🔇 Additional comments (2)

internal/sms-gateway/online/service.go (2)

14-17: Service API shape looks good.

Clear, minimal surface: Run + SetOnline. No changes requested.

67-71: Confirm Drain() is atomic across backends.

Redis: Drain calls r.client.Eval(ctx, hgetallAndDeleteScript, []string{r.key}) — EVAL is atomic on the Redis server, so Redis backend is atomic.

Memory: pkg/cache/memory.go: Drain(...) — verify it acquires the cache mutex (or performs an atomic map-swap) while copying/clearing entries; if it doesn't, concurrent Set() during Drain can be lost or duplicated.

internal/sms-gateway/online/service.go

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pkg/cache/memory_bench_test.go (1)

1-461: Fix ignored returns — remove file-level //nolint:errcheck and handle or narrowly suppress per-call

rg found 31 cache.* callsites in pkg/cache/memory_bench_test.go that ignore return values; file-level //nolint:errcheck is still present in:

pkg/cache/memory_bench_test.go (lines: 25,38,43,61,62,77,95,96,114,115,129,135,149,155,168,185,217,250,257,287,296,362,363,383,389,406,415,428,433,448,457)

pkg/cache/memory_concurrency_test.go:1

pkg/cache/memory_profile_test.go:1

Actions:

Remove the top-level //nolint:errcheck and either handle errors at each callsite, assert them in tests/benchmarks, or apply per-call //nolint:errcheck with a one-line justification.

Re-run the grep to verify no remaining ignored returns.

♻️ Duplicate comments (5)

pkg/cache/redis.go (3)

95-107: Get(): Redis-backed cache returns ErrKeyNotFound for expired fields — violates Cache.Get contract

Evidence: cache.go documents Get must return ErrKeyExpired on expiry; memory implementation returns ErrKeyExpired (pkg/cache/memory.go) and tests assert ErrKeyExpired (pkg/cache/memory_edge_test.go). pkg/cache/redis.go's Get returns ErrKeyNotFound when HGet yields redis.Nil (Redis removes expired fields), so expired vs missing are indistinguishable.

Required action (choose one and apply consistently): (A) update Cache contract/docs/tests to treat expired as not-found for Redis-backed caches, or (B) change Redis implementation to track expiry metadata (parallel hash/sorted-set or store expiry timestamps) and return ErrKeyExpired.

145-168: Bug: HExpireAt missing field in SetOrFail → runtime “wrong number of arguments”.

Pass the field name; also clean up the inserted value if TTL setting fails to avoid immortal entries.
 if !val {
   return ErrKeyExists
 }
@@
 if !options.validUntil.IsZero() {
-  if err := r.client.HExpireAt(ctx, r.key, options.validUntil).Err(); err != nil {
-    return fmt.Errorf("can't set cache item ttl: %w", err)
-  }
+  if err := r.client.HExpireAt(ctx, r.key, options.validUntil, key).Err(); err != nil {
+    // Avoid leaving a non-expiring entry if TTL application fails.
+    _ = r.client.HDel(ctx, r.key, key).Err()
+    return fmt.Errorf("can't set cache item ttl: %w", err)
+  }
 }
9-10: Fix HExpireAt usage and verify Redis ≥7.4 in CI

go.mod shows github.com/redis/go-redis/v9 v9.9.0 — OK (>= v9.6).

pkg/cache/redis.go:136 — currently r.client.HExpireAt(ctx, r.key, options.validUntil, key); args are out of order. Change to r.client.HExpireAt(ctx, r.key, key, options.validUntil).

pkg/cache/redis.go:162 — currently r.client.HExpireAt(ctx, r.key, options.validUntil); missing the field argument. Change to r.client.HExpireAt(ctx, r.key, key, options.validUntil) (or the correct field name).

REDIS_URL not set locally; server version couldn't be checked — ensure CI/environments run Redis 7.4+.

pkg/cache/memory_bench_test.go (2)

4-13: Import errors and treat ErrKeyExists as expected in SetOrFail.

Use errors.Is or direct compare against cache.ErrKeyExists; fail on any other error.

 import (
   "context"
+  "errors"
   "fmt"
   "math/rand"
   "strconv"
   "testing"
   "time"
@@
   b.RunParallel(func(pb *testing.PB) {
-    for pb.Next() {
-      cache.SetOrFail(ctx, key, value)
-    }
+    for pb.Next() {
+      if err := cache.SetOrFail(ctx, key, value); err != nil && !errors.Is(err, cache.ErrKeyExists) {
+        b.Fatal(err)
+      }
+    }
   })

Also applies to: 74-80

1-1: Stop suppressing errcheck; handle errors in benchmarks.

Dropping errors risks benchmarking failure paths and hiding regressions. Remove //nolint:errcheck and check/handle returned errors. Apply the pattern below and replicate across the file.

-//nolint:errcheck
 package cache_test
@@
 func BenchmarkMemoryCache_Set(b *testing.B) {
-  cache := cache.NewMemory(0)
+  cache := cache.NewMemory(0)
   ctx := context.Background()
   key := "benchmark-key"
   value := "benchmark-value"
@@
   b.RunParallel(func(pb *testing.PB) {
     for pb.Next() {
-      cache.Set(ctx, key, value)
+      if err := cache.Set(ctx, key, value); err != nil {
+        b.Fatal(err)
+      }
     }
   })
 }
@@
-// Pre-populate the cache
-cache.Set(ctx, key, value)
+// Pre-populate the cache
+if err := cache.Set(ctx, key, value); err != nil {
+  b.Fatal(err)
+}
@@
   b.RunParallel(func(pb *testing.PB) {
     for pb.Next() {
-      cache.Get(ctx, key)
+      if _, err := cache.Get(ctx, key); err != nil {
+        b.Fatal(err)
+      }
     }
   })
 }
@@
   b.RunParallel(func(pb *testing.PB) {
     i := 0
     for pb.Next() {
       key := "key-" + strconv.Itoa(i)
       value := "value-" + strconv.Itoa(i)
       i++
-      cache.Set(ctx, key, value)
-      cache.Get(ctx, key)
+      if err := cache.Set(ctx, key, value); err != nil {
+        b.Fatal(err)
+      }
+      if _, err := cache.Get(ctx, key); err != nil {
+        b.Fatal(err)
+      }
     }
   })
 }
@@
   b.RunParallel(func(pb *testing.PB) {
-    for pb.Next() {
-      cache.SetOrFail(ctx, key, value)
-    }
+    i := 0
+    for pb.Next() {
+      // Use distinct keys OR ignore expected ErrKeyExists.
+      testKey := key + "-" + strconv.Itoa(i)
+      if err := cache.SetOrFail(ctx, testKey, value); err != nil && err != cache.ErrKeyExists {
+        b.Fatal(err)
+      }
+      i++
+    }
   })
 }
@@
   b.RunParallel(func(pb *testing.PB) {
     i := 0
     for pb.Next() {
       key := "key-" + strconv.Itoa(i)
       value := "value-" + strconv.Itoa(i)
       i++
-      cache.Set(ctx, key, value)
-      cache.GetAndDelete(ctx, key)
+      if err := cache.Set(ctx, key, value); err != nil {
+        b.Fatal(err)
+      }
+      if got, err := cache.GetAndDelete(ctx, key); err != nil {
+        b.Fatal(err)
+      } else if got != value {
+        b.Fatalf("GetAndDelete: got %q, want %q", got, value)
+      }
     }
   })
 }
@@
   b.RunParallel(func(pb *testing.PB) {
     i := 0
     for pb.Next() {
       key := "key-" + strconv.Itoa(i)
       value := "value-" + strconv.Itoa(i)
       i++
-      cache.Set(ctx, key, value)
-      cache.Delete(ctx, key)
+      if err := cache.Set(ctx, key, value); err != nil {
+        b.Fatal(err)
+      }
+      if err := cache.Delete(ctx, key); err != nil {
+        b.Fatal(err)
+      }
     }
   })
 }
@@
   for i := 0; i < 1000; i++ {
     key := "item-" + strconv.Itoa(i)
     value := "value-" + strconv.Itoa(i)
-    cache.Set(ctx, key, value)
+    if err := cache.Set(ctx, key, value); err != nil {
+      b.Fatal(err)
+    }
   }
@@
   b.RunParallel(func(pb *testing.PB) {
     for pb.Next() {
-      cache.Cleanup(ctx)
+      if err := cache.Cleanup(ctx); err != nil {
+        b.Fatal(err)
+      }
     }
   })
 }
@@
   for i := 0; i < 1000; i++ {
     key := "item-" + strconv.Itoa(i)
     value := "value-" + strconv.Itoa(i)
-    cache.Set(ctx, key, value)
+    if err := cache.Set(ctx, key, value); err != nil {
+      b.Fatal(err)
+    }
   }
@@
   b.RunParallel(func(pb *testing.PB) {
     for pb.Next() {
-      cache.Drain(ctx)
+      if _, err := cache.Drain(ctx); err != nil {
+        b.Fatal(err)
+      }
     }
   })
 }
@@
-// Pre-populate the cache
-cache.Set(ctx, key, value)
+// Pre-populate the cache
+if err := cache.Set(ctx, key, value); err != nil {
+  b.Fatal(err)
+}
@@
-          cache.Get(ctx, key)
+          if _, err := cache.Get(ctx, key); err != nil {
+            b.Fatal(err)
+          }
@@
-          cache.Set(ctx, key, value)
+          if err := cache.Set(ctx, key, value); err != nil {
+            b.Fatal(err)
+          }
@@
-        if bm.withTTL {
-          c.Set(ctx, key, value, cache.WithTTL(ttl))
-        } else {
-          c.Set(ctx, key, value)
-        }
+        if bm.withTTL {
+          if err := c.Set(ctx, key, value, cache.WithTTL(ttl)); err != nil {
+            b.Fatal(err)
+          }
+        } else {
+          if err := c.Set(ctx, key, value); err != nil {
+            b.Fatal(err)
+          }
+        }
@@
-          cache.Set(ctx, key, string(value))
-          cache.Get(ctx, key)
+          if err := cache.Set(ctx, key, string(value)); err != nil {
+            b.Fatal(err)
+          }
+          if _, err := cache.Get(ctx, key); err != nil {
+            b.Fatal(err)
+          }
@@
-        cache.Drain(ctx)
+        if _, err := cache.Drain(ctx); err != nil {
+          b.Fatal(err)
+        }
@@
-        cache.Set(ctx, key, value)
+        if err := cache.Set(ctx, key, value); err != nil {
+          b.Fatal(err)
+        }
@@
-    cache.Set(ctx, key, value)
+    if err := cache.Set(ctx, key, value); err != nil {
+      b.Fatal(err)
+    }
@@
-      cache.Get(ctx, key)
+      if _, err := cache.Get(ctx, key); err != nil {
+        b.Fatal(err)
+      }
@@
-// Pre-populate the hot key
-cache.Set(ctx, hotKey, value)
+// Pre-populate the hot key
+if err := cache.Set(ctx, hotKey, value); err != nil {
+  b.Fatal(err)
+}
@@
-      cache.Get(ctx, hotKey)
+      if _, err := cache.Get(ctx, hotKey); err != nil {
+        b.Fatal(err)
+      }
@@
-  cache.Set(ctx, key, value)
+  if err := cache.Set(ctx, key, value); err != nil {
+    b.Fatal(err)
+  }
@@
-      cache.Get(ctx, key)
+      if _, err := cache.Get(ctx, key); err != nil {
+        b.Fatal(err)
+      }

Also applies to: 22-27, 37-46, 61-63, 74-80, 95-97, 114-116, 125-131, 134-136, 145-151, 154-156, 168-169, 183-186, 213-218, 247-258, 287-288, 296-297, 323-330, 362-364, 383-390, 403-407, 414-416, 428-429, 433-434, 446-449, 456-458

🧹 Nitpick comments (15)

internal/sms-gateway/modules/devices/repository.go (3)
71-71: LGTM on parameterized UPDATE for push_token. Consider nullable clear semantics.

Good fix switching to Where("id = ?", id). If you ever need to clear the token, consider supporting NULL (column is *string) via a companion ClearPushToken(id) or by accepting *string and passing nil when clearing.

74-87: Monotonic last_seen update looks correct. Normalize time to avoid precision churn.

To align with DB precision (models default CURRENT_TIMESTAMP(3)), normalize to UTC and truncate to milliseconds before the update to reduce redundant no-op comparisons due to sub-ms/nanos.

Apply:
 func (r *repository) SetLastSeen(ctx context.Context, id string, lastSeen time.Time) error {
   if lastSeen.IsZero() {
     return nil // ignore zero timestamps
   }
+  // Normalize to DB precision (ms) and UTC to avoid precision mismatches.
+  lastSeen = lastSeen.UTC().Truncate(time.Millisecond)

   res := r.db.WithContext(ctx).
     Model(&models.Device{}).
     Where("id = ? AND last_seen < ?", id, lastSeen).
     UpdateColumn("last_seen", lastSeen)
   if res.Error != nil {
     return res.Error
   }

   // RowsAffected==0 => not found or stale timestamp; treat as no-op.
   return nil
 }
102-103: Cleanup by last_seen: add an index to keep this fast at scale.

This path will scan without an index on last_seen. Consider adding a DB index (e.g., GORM tag index:idx_devices_last_seen) to models.Device.LastSeen to keep periodic cleanup O(log n).

(Outside this diff) Update models:
LastSeen time.Time `gorm:"not null;autocreatetime:false;default:CURRENT_TIMESTAMP(3);index:idx_devices_last_seen"`
internal/sms-gateway/modules/devices/service.go (2)
104-123: Batch SetLastSeen: skip zero timestamps; consider bounded parallelism for large batches.

Minor: avoid pointless DB calls for zero times; Optional: use a worker pool (errgroup with limited concurrency) if batches can be large.

Apply the zero-time guard:
   for deviceID, lastSeen := range batch {
     if err := ctx.Err(); err != nil {
       return errors.Join(err, multiErr)
     }
+    if lastSeen.IsZero() {
+      continue
+    }
     if err := s.devices.SetLastSeen(ctx, deviceID, lastSeen); err != nil {
       multiErr = errors.Join(multiErr, fmt.Errorf("device %s: %w", deviceID, err))
       s.logger.Error("can't set last seen",
         zap.String("device_id", deviceID),
         zap.Time("last_seen", lastSeen),
         zap.Error(err),
       )
     }
   }
If desired later, I can sketch a 8–16 worker pool variant.

136-144: Token cache invalidation: OK. Consider down-level logging for not-found.

If Delete returns a not-found sentinel, consider downgrading that case to debug to avoid noisy logs.
internal/sms-gateway/modules/auth/service.go (2)
61-63: Name the component logger.

Use a named logger for clearer attribution in logs.
-        logger:     params.Logger,
+        logger:     params.Logger.Named("auth"),
146-150: Async SetOnline: make timeout configurable and guard against missing DI.

Replace hardcoded 5s with a constant/config.

Guard s.onlineSvc before launching the goroutine and log if it's not configured.

Note: device.LastSeen is set immediately after the goroutine (device.LastSeen = time.Now()), and SetOnline is implemented in internal/sms-gateway/online/service.go — changing the timeout / adding a nil-check is safe.
-  go func(id string) {
-    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
-    defer cancel()
-    s.onlineSvc.SetOnline(ctx, id)
-  }(device.ID)
+  if s.onlineSvc != nil {
+    go func(id string) {
+      ctx, cancel := context.WithTimeout(context.Background(), onlineSetTimeout)
+      defer cancel()
+      s.onlineSvc.SetOnline(ctx, id)
+    }(device.ID)
+  } else {
+    s.logger.Debug("online service not configured; skipping SetOnline")
+  }
Add (package scope):
const onlineSetTimeout = 2 * time.Second
pkg/cache/redis.go (4)
45-57: Document Redis 7.4+ requirement (HEXPIRE) and behavior upfront.

Add a brief doc comment so callers know this depends on Redis 7.4+ and go-redis v9.6+ and that per-field TTL is used.
-func NewRedis(client *redis.Client, prefix string, ttl time.Duration) Cache {
+// NewRedis creates a Redis-backed cache using a single hash with per-field TTL (HEXPIRE/HEXPIREAT).
+// Requires Redis 7.4+ and go-redis v9.6+.
+func NewRedis(client *redis.Client, prefix string, ttl time.Duration) Cache {
59-62: Clarify why Cleanup is a no-op.

Make the no-op explicit since Redis evicts expired fields itself.
-// Cleanup implements Cache.
+// Cleanup implements Cache.
+// Redis automatically evicts expired hash fields; nothing to do here.
 func (r *redisCache) Cleanup(_ context.Context) error {
26-34: Drain Lua loops HDEL per field — switch to UNLINK (or DEL) to avoid O(n) work.

Atomicity is good, but looping HDEL can block Redis on large hashes. UNLINK frees asynchronously; fall back to DEL if unavailable.
-	hgetallAndDeleteScript = `
-local items = redis.call('HGETALL', KEYS[1])
-if #items > 0 then
-  for i = 1, #items, 2 do
-    redis.call('HDEL', KEYS[1], items[i])
-  end
-end
-return items
-`
+	hgetallAndDeleteScript = `
+local items = redis.call('HGETALL', KEYS[1])
+if #items > 0 then
+  local ok = pcall(redis.call, 'UNLINK', KEYS[1])
+  if not ok then redis.call('DEL', KEYS[1]) end
+end
+return items
+`
123-142: Set: make HSET + HEXPIREAT a single round-trip.

Pipeline to reduce race surface and network chatter. If TTL setting fails today, value may remain immortal.
- if err := r.client.HSet(ctx, r.key, key, value).Err(); err != nil {
-   return fmt.Errorf("can't set cache item: %w", err)
- }
-
- if !options.validUntil.isZero() {
-   if err := r.client.HExpireAt(ctx, r.key, options.validUntil, key).Err(); err != nil {
-     return fmt.Errorf("can't set cache item ttl: %w", err)
-   }
- }
+ _, err := r.client.Pipelined(ctx, func(p redis.Pipeliner) error {
+   p.HSet(ctx, r.key, key, value)
+   if !options.validUntil.IsZero() {
+     p.HExpireAt(ctx, r.key, options.validUntil, key)
+   }
+   return nil
+ })
+ if err != nil {
+   return fmt.Errorf("can't set cache item: %w", err)
+ }
pkg/cache/memory_bench_test.go (4)
239-247: MixedWorkload: pre-populate working set to avoid measuring miss/path mostly.

Reads target keys 0..99 but nothing seeds them; you’re benchmarking ErrKeyNotFound. Seed before ResetTimer.
   for _, bm := range benchmarks {
     b.Run(bm.name, func(b *testing.B) {
+      // Seed working set for read hits
+      for i := 0; i < 100; i++ {
+        if err := cache.Set(ctx, fmt.Sprintf("key-%d", i), "seed"); err != nil {
+          b.Fatal(err)
+        }
+      }
       b.ResetTimer()
       b.RunParallel(func(pb *testing.PB) {
354-357: LargeValues: avoid per-iteration []byte→string conversion overhead.

Convert once to focus the benchmark on cache behavior, not conversion cost.
       value := make([]byte, size.size)
       for i := range value {
         value[i] = byte(i % 256)
       }
+      valueStr := string(value)
@@
       b.RunParallel(func(pb *testing.PB) {
         for pb.Next() {
-          cache.Set(ctx, key, string(value))
-          cache.Get(ctx, key)
+          if err := cache.Set(ctx, key, valueStr); err != nil {
+            b.Fatal(err)
+          }
+          if _, err := cache.Get(ctx, key); err != nil {
+            b.Fatal(err)
+          }
         }
       })
Also applies to: 360-364

372-374: MemoryGrowth: check Drain/Set errors and consider reporting allocs.

Add error checks and call b.ReportAllocs to make the benchmark more informative.
 func BenchmarkMemoryCache_MemoryGrowth(b *testing.B) {
   cache := cache.NewMemory(0)
   ctx := context.Background()
+  b.ReportAllocs()
@@
       // Clear cache
-      cache.Drain(ctx)
+      if _, err := cache.Drain(ctx); err != nil {
+        b.Fatal(err)
+      }
@@
         key := "key-" + strconv.Itoa(j)
         value := "value-" + strconv.Itoa(j)
-        cache.Set(ctx, key, value)
+        if err := cache.Set(ctx, key, value); err != nil {
+          b.Fatal(err)
+        }
Also applies to: 381-391

162-169: Nit: avoid shadowing the imported package name with local cache variables.

Prefer c or mem to reduce confusion when also referencing package identifiers (e.g., cache.WithTTL).

Also applies to: 398-407, 420-429, 445-449

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5f3c2c and e4dce05.

📒 Files selected for processing (17)

.github/workflows/docker-publish.yml (0 hunks)
.github/workflows/go.yml (1 hunks)
.github/workflows/golangci-lint.yml (0 hunks)
configs/config.example.yml (1 hunks)
internal/sms-gateway/app.go (4 hunks)
internal/sms-gateway/cache/factory.go (1 hunks)
internal/sms-gateway/modules/auth/service.go (5 hunks)
internal/sms-gateway/modules/devices/repository.go (2 hunks)
internal/sms-gateway/modules/devices/service.go (3 hunks)
internal/sms-gateway/online/module.go (1 hunks)
internal/sms-gateway/online/service.go (1 hunks)
pkg/cache/memory.go (1 hunks)
pkg/cache/memory_bench_test.go (1 hunks)
pkg/cache/memory_concurrency_test.go (1 hunks)
pkg/cache/memory_edge_test.go (1 hunks)
pkg/cache/memory_profile_test.go (1 hunks)
pkg/cache/redis.go (1 hunks)

💤 Files with no reviewable changes (2)

.github/workflows/docker-publish.yml
.github/workflows/golangci-lint.yml

✅ Files skipped from review due to trivial changes (1)

configs/config.example.yml

🚧 Files skipped from review as they are similar to previous changes (9)

pkg/cache/memory_concurrency_test.go
internal/sms-gateway/online/module.go
internal/sms-gateway/online/service.go
internal/sms-gateway/app.go
.github/workflows/go.yml
pkg/cache/memory_profile_test.go
internal/sms-gateway/cache/factory.go
pkg/cache/memory_edge_test.go
pkg/cache/memory.go

🧰 Additional context used

🧠 Learnings (3)

📚 Learning: 2025-09-10T11:11:43.462Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/memory.go:59-74
Timestamp: 2025-09-10T11:11:43.462Z
Learning: In pkg/cache/memory.go, the cleanup method removes expired items first, then executes the provided callback function. This means methods like Drain that use cleanup don't need additional expiration filtering since expired items are already removed before the callback runs.

Applied to files:

pkg/cache/memory_bench_test.go
pkg/cache/redis.go

📚 Learning: 2025-09-10T23:53:42.006Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/redis.go:63-75
Timestamp: 2025-09-10T23:53:42.006Z
Learning: In pkg/cache/redis.go, the Redis cache implementation uses Redis's native field-level TTL via HExpireAt, meaning expired fields are automatically removed by Redis itself. This differs from the memory implementation where the application tracks expiration times. As a result, operations like Drain that call HGetAll will only return non-expired items since expired fields are already removed by Redis.

Applied to files:

pkg/cache/redis.go

📚 Learning: 2025-09-10T23:54:16.828Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/redis.go:118-121
Timestamp: 2025-09-10T23:54:16.828Z
Learning: Redis 7.4.0 introduced HEXPIRE, HEXPIREAT, and related commands that support per-field TTL on hash data structures. go-redis v9.6.0+ includes support for these commands with methods like HExpireAt.

Applied to files:

pkg/cache/redis.go

🧬 Code graph analysis (5)

internal/sms-gateway/modules/auth/service.go (2)

internal/sms-gateway/modules/devices/service.go (1)

Service (30-39)

internal/sms-gateway/online/service.go (1)

Service (14-17)

pkg/cache/memory_bench_test.go (2)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/options.go (1)

WithTTL (22-30)

internal/sms-gateway/modules/devices/service.go (3)

internal/sms-gateway/modules/auth/service.go (1)

Service (39-52)

internal/sms-gateway/online/service.go (1)

Service (14-17)

internal/sms-gateway/modules/webhooks/service.go (1)

Service (28-37)

pkg/cache/redis.go (3)

pkg/cache/cache.go (1)

Cache (5-36)

pkg/cache/errors.go (2)

ErrKeyNotFound (7-7)

ErrKeyExists (11-11)

pkg/cache/options.go (1)

Option (6-6)

internal/sms-gateway/modules/devices/repository.go (1)

internal/sms-gateway/models/models.go (1)

Device (25-36)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Benchmark
GitHub Check: Analyze (go)

🔇 Additional comments (7)

internal/sms-gateway/modules/devices/service.go (1)

7-7: errors.Join usage OK — toolchain ≥ Go 1.20 verified.
go.mod declares go 1.24.1 and Dockerfiles use golang:1.24-alpine, so errors.Join in internal/sms-gateway/modules/devices/service.go (lines 112, 115) is supported — no change needed.

internal/sms-gateway/modules/auth/service.go (3)

34-34: Params addition looks good.

Interface-based DI for OnlineSvc is appropriate.

47-47: Service field addition is fine.

Keeps dependency local and testable.

14-14: DI: online.Module is wired in the app; online.Service is provided — no app startup panic expected.

internal/sms-gateway/app.go registers online.Module(); internal/sms-gateway/online/module.go fx.Provide(New)/fx.Invoke bind the Service (interface in internal/sms-gateway/online/service.go); auth module depends on online.Service (internal/sms-gateway/modules/auth/service.go). No _test.go references to online.Module/online.Service found — add online.Module to any tests that construct auth.Module in isolation.

pkg/cache/redis.go (3)

64-71: Delete path matches interface semantics.

HDel is idempotent and returns nil if missing; good.

73-93: Atomic drain via Lua looks correct.

Combined with the UNLINK/DEL script above, this should be both atomic and efficient.

109-121: Atomic GetAndDelete via Lua looks good.

Correctly returns ErrKeyNotFound when absent.

pkg/cache/memory_bench_test.go

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

internal/sms-gateway/models/migrations/mysql/20250918231606_add_last_seen_index.sql (2)
1-4: Make the Up migration idempotent and less blocking.

As written, CREATE INDEX will fail if the index already exists and doesn’t enforce LOCK=NONE. Prefer conditional creation and ALTER TABLE ... ALGORITHM=INPLACE, LOCK=NONE for MySQL 8 to minimize lock time.

Apply:
--- a/internal/sms-gateway/models/migrations/mysql/20250918231606_add_last_seen_index.sql
+++ b/internal/sms-gateway/models/migrations/mysql/20250918231606_add_last_seen_index.sql
 -- +goose Up
 -- +goose StatementBegin
-CREATE INDEX `idx_devices_last_seen` ON `devices`(`last_seen`);
+SET @exists := (
+  SELECT COUNT(1)
+  FROM INFORMATION_SCHEMA.STATISTICS
+  WHERE TABLE_SCHEMA = DATABASE()
+    AND TABLE_NAME = 'devices'
+    AND INDEX_NAME = 'idx_devices_last_seen'
+);
+SET @sql := IF(@exists = 0,
+  'ALTER TABLE `devices` ADD INDEX `idx_devices_last_seen` (`last_seen`), ALGORITHM=INPLACE, LOCK=NONE',
+  'SELECT 1');
+PREPARE stmt FROM @sql; EXECUTE stmt; DEALLOCATE PREPARE stmt;
 -- +goose StatementEnd
6-9: Guard the Down migration.

Dropping an index that’s already gone will error. Make it conditional.

Apply:
--- a/internal/sms-gateway/models/migrations/mysql/20250918231606_add_last_seen_index.sql
+++ b/internal/sms-gateway/models/migrations/mysql/20250918231606_add_last_seen_index.sql
 -- +goose Down
 -- +goose StatementBegin
-DROP INDEX `idx_devices_last_seen` ON `devices`;
+SET @exists := (
+  SELECT COUNT(1)
+  FROM INFORMATION_SCHEMA.STATISTICS
+  WHERE TABLE_SCHEMA = DATABASE()
+    AND TABLE_NAME = 'devices'
+    AND INDEX_NAME = 'idx_devices_last_seen'
+);
+SET @sql := IF(@exists = 1,
+  'DROP INDEX `idx_devices_last_seen` ON `devices`',
+  'SELECT 1');
+PREPARE stmt FROM @sql; EXECUTE stmt; DEALLOCATE PREPARE stmt;
 -- +goose StatementEnd
internal/sms-gateway/models/models.go (1)

31-31: Add composite index (user_id, last_seen) for per-user last-seen queries.

modules/devices/repository_filter.go applies both user_id and last_seen filters — add a migration creating an index on (user_id, last_seen); keep the existing single-column last_seen index for global scans/operations (e.g. removeUnused, SetLastSeen uses id+last_seen).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4dce05 and e87a0be.

📒 Files selected for processing (4)

internal/sms-gateway/models/migrations/mysql/20250918231606_add_last_seen_index.sql (1 hunks)
internal/sms-gateway/models/models.go (1 hunks)
pkg/cache/memory_bench_test.go (1 hunks)
pkg/cache/redis.go (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

pkg/cache/memory_bench_test.go
pkg/cache/redis.go

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Build / Docker image (linux/arm64)
GitHub Check: Build / Docker image (linux/amd64)
GitHub Check: Benchmark
GitHub Check: Analyze (go)

internal/sms-gateway/models/models.go

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (4)

internal/sms-gateway/online/service.go (4)

56-61: Validate deviceID before writing to cache.

Guard against empty/whitespace IDs to avoid empty-key writes and noisy logs.

 func (s *service) SetOnline(ctx context.Context, deviceID string) {
+  if strings.TrimSpace(deviceID) == "" {
+    s.logger.Warn("Empty device_id; skipping SetOnline")
+    s.metrics.IncrementCacheOperation(operationSet, statusError)
+    s.metrics.IncrementStatusSet(false)
+    return
+  }
   dt := time.Now().UTC().Format(time.RFC3339)

Add import:

 import (
   "context"
   "fmt"
   "time"
+  "strings"

34-35: Name the logger for this module.

Scopes logs consistently to "online".

-    logger:  logger,
+    logger:  logger.Named("online"),

22-23: Confirm cache TTL and Drain atomicity.

Ensure the "online" cache uses a sensible TTL (e.g., 10–30m) and Drain is atomic w.r.t. concurrent Set to avoid dropping in-flight updates.

#!/bin/bash
# Inspect Cache interface and implementations for Drain semantics and default TTL.
rg -nP -C3 --type=go 'type\s+Cache\b[\s\S]*?Drain\('
rg -nP -C3 --type=go 'New\w*Cache|NewMemory|NewRedis|WithTTL\('
rg -nP -C3 --type=go 'factory\.New\("online"\)|Named\("online"\)'

Also applies to: 56-76, 79-126

108-115: Avoid data loss: re-queue drained items if persistence fails.

On SetLastSeen error, drained entries are dropped. Reinsert best-effort to the cache.

-    if err := s.devicesSvc.SetLastSeen(ctx, timestamps); err != nil {
-      persistErr = fmt.Errorf("can't set last seen: %w", err)
-      s.metrics.IncrementPersistenceError(operationPersist)
-      return
-    }
+    if err := s.devicesSvc.SetLastSeen(ctx, timestamps); err != nil {
+      persistErr = fmt.Errorf("can't set last seen: %w", err)
+      s.metrics.IncrementPersistenceError(operationPersist)
+      // Best-effort requeue drained items
+      for id, v := range items {
+        if setErr := s.cache.Set(ctx, id, v); setErr != nil {
+          s.logger.Warn("Failed to requeue last_seen", zap.String("device_id", id), zap.Error(setErr))
+        }
+      }
+      return
+    }

🧹 Nitpick comments (6)

internal/sms-gateway/online/metrics.go (2)
31-33: Use plain Histogram instead of HistogramVec with zero labels.

Slightly simpler and avoids WithLabelValues() calls; behavior is identical here.
 type metrics struct {
   statusSetCounter   *prometheus.CounterVec
   cacheOperations    *prometheus.CounterVec
-  cacheLatency       *prometheus.HistogramVec
-  persistenceLatency *prometheus.HistogramVec
+  cacheLatency       prometheus.Histogram
+  persistenceLatency prometheus.Histogram
   persistenceErrors  *prometheus.CounterVec
 }

@@
-    cacheLatency: promauto.NewHistogramVec(prometheus.HistogramOpts{
+    cacheLatency: promauto.NewHistogram(prometheus.HistogramOpts{
       Namespace: "sms",
       Subsystem: "online",
       Name:      metricCacheLatency,
       Help:      "Cache operation latency in seconds",
       Buckets:   []float64{0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1},
-    }, []string{}),
+    }),
@@
-    persistenceLatency: promauto.NewHistogramVec(prometheus.HistogramOpts{
+    persistenceLatency: promauto.NewHistogram(prometheus.HistogramOpts{
       Namespace: "sms",
       Subsystem: "online",
       Name:      metricPersistenceLatency,
       Help:      "Persistence operation latency in seconds",
       Buckets:   []float64{0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1},
-    }, []string{}),
+    }),
@@
 func (m *metrics) ObserveCacheLatency(f func()) {
-  timer := prometheus.NewTimer(m.cacheLatency.WithLabelValues())
+  timer := prometheus.NewTimer(m.cacheLatency)
   f()
   timer.ObserveDuration()
 }
@@
 func (m *metrics) ObservePersistenceLatency(f func()) {
-  timer := prometheus.NewTimer(m.persistenceLatency.WithLabelValues())
+  timer := prometheus.NewTimer(m.persistenceLatency)
   f()
   timer.ObserveDuration()
 }
Also applies to: 53-60, 61-68, 92-97, 99-104

36-76: Ensure metrics are single-registered or accept a prometheus.Registerer

Verified: newMetrics is provided via fx.Provide in these modules: internal/sms-gateway/online/module.go, internal/sms-gateway/modules/push/module.go, internal/sms-gateway/modules/events/module.go, internal/sms-gateway/modules/sse/module.go. Each package registers metrics with promauto.New using distinct Namespace+Subsystem combinations — no immediate duplicate metric-name registration found.

Action: to avoid panics in tests or future duplicate construction, accept a prometheus.Registerer (use promauto.With(reg)) or ensure newMetrics is only constructed once (singleton wiring).
deployments/prometheus/alerts/online-alerts.yml (1)

10-12: Replace placeholder runbook URLs.

example.com placeholders reduce SRE usefulness during pages.

I can draft concise runbooks if you share the intended remediation steps.

Also applies to: 20-22, 30-32, 40-42
internal/sms-gateway/online/service.go (3)
97-101: Demote per-item parse errors to warn (or sample) to avoid log spam.

Parsing issues on individual timestamps can flood logs; warn once per item is fine, but consider sampling or counting.
-        s.logger.Error("Can't parse last seen", zap.String("last_seen", v), zap.Error(err))
+        s.logger.Warn("Can't parse last seen", zap.String("last_seen", v), zap.Error(err))
40-48: Optional: add jitter to the tick to avoid herd effects across replicas.

Small per-instance jitter (e.g., ±10–20s) prevents synchronized drains.

I can propose a jittered ticker helper if desired.

20-26: Depend on an interface for devices service to improve testability.

Field is a concrete *devices.Service; prefer a narrow interface defined here.

Example:

Define: type lastSeenWriter interface { SetLastSeen(context.Context, map[string]time.Time) error }

Store lastSeenWriter in service; adapt New to accept that.

Also applies to: 28-37

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e513b7 and 0cba817.

📒 Files selected for processing (5)

deployments/grafana/dashboards/online.json (1 hunks)
deployments/prometheus/alerts/online-alerts.yml (1 hunks)
internal/sms-gateway/online/metrics.go (1 hunks)
internal/sms-gateway/online/module.go (1 hunks)
internal/sms-gateway/online/service.go (1 hunks)

✅ Files skipped from review due to trivial changes (1)

deployments/grafana/dashboards/online.json

🚧 Files skipped from review as they are similar to previous changes (1)

internal/sms-gateway/online/module.go

🧰 Additional context used

🧬 Code graph analysis (1)

internal/sms-gateway/online/service.go (3)

internal/sms-gateway/modules/devices/service.go (1)

Service (30-39)

internal/sms-gateway/modules/auth/service.go (2)

Service (39-52)

New (54-68)

internal/sms-gateway/cache/factory.go (1)

Cache (15-15)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: E2E
GitHub Check: Benchmark
GitHub Check: Analyze (go)

deployments/prometheus/alerts/online-alerts.yml

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

deployments/prometheus/alerts/online-alerts.yml (1)

1-38: Nice fix: counters use increase/rate; latency uses histogram_quantile over _bucket.

This addresses the prior alerting pitfalls and looks correct.

🧹 Nitpick comments (1)

deployments/prometheus/alerts/online-alerts.yml (1)
13-21: Optional: surface operation as a static alert label for routing.

Prometheus will carry the operation label from the expr, but adding it to labels can simplify Alertmanager routing/grouping.
       labels:
         severity: warning
+        operation: '{{ $labels.operation }}'

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0cba817 and c0f1836.

📒 Files selected for processing (3)

deployments/prometheus/alerts/online-alerts.yml (1 hunks)
internal/sms-gateway/online/metrics.go (1 hunks)
internal/sms-gateway/online/service.go (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

internal/sms-gateway/online/metrics.go
internal/sms-gateway/online/service.go

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Benchmark
GitHub Check: E2E
GitHub Check: Analyze (go)

🔇 Additional comments (1)

deployments/prometheus/alerts/online-alerts.yml (1)

31-38: Verify histogram label dimensions for HighPersistenceLatency alert

rg returned no matches for sms_online_persistence_latency_seconds_bucket; confirm the metric name and its label set and either scope the aggregation by those labels or create per-dimension alerts. Examples:

Per-label: histogram_quantile(0.95, sum by (le, operation) (rate(sms_online_persistence_latency_seconds_bucket[5m]))) > 0.5

Global (only if no extra labels): histogram_quantile(0.95, sum by (le) (rate(sms_online_persistence_latency_seconds_bucket[5m]))) > 0.5

Verify by searching the repo without a type filter: rg -n -C3 'sms_online_persistence_latency_seconds(_bucket)?' or inspect the instrumentation that registers the metric.

File: deployments/prometheus/alerts/online-alerts.yml:31-38

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (5)

internal/sms-gateway/online/service.go (5)

28-36: Scope the logger with a module name.

Name the logger to keep logs consistently scoped to this service.

 func New(devicesSvc *devices.Service, cache cache.Cache, logger *zap.Logger, metrics *metrics) Service {
 	return &service{
 		devicesSvc: devicesSvc,
 
 		cache: cache,
 
-		logger:  logger,
+		logger:  logger.Named("online"),
 		metrics: metrics,
 	}
 }

3-12: Import strings for input validation.

Needed for the deviceID guard.

 import (
 	"context"
 	"fmt"
 	"time"
+	"strings"
 
 	"github.com/android-sms-gateway/server/internal/sms-gateway/modules/devices"
 	"github.com/android-sms-gateway/server/pkg/cache"
 	"github.com/capcom6/go-helpers/maps"
 	"go.uber.org/zap"
 )

56-61: Guard against empty/whitespace deviceID to avoid empty-key writes.

Skip cache writes when deviceID is blank.

 func (s *service) SetOnline(ctx context.Context, deviceID string) {
+	if strings.TrimSpace(deviceID) == "" {
+		s.logger.Warn("Empty device_id; skipping SetOnline")
+		return
+	}
 	dt := time.Now().UTC().Format(time.RFC3339)

82-91: Confirm cache Drain semantics (atomicity and namespacing/TTL).

Ensure Drain is atomic across workers and that the “online” cache has a non-zero TTL to expire entries if persisting stalls.

#!/bin/bash
# Inspect cache interface/impl for Drain atomicity and TTL for "online"
rg -nP -C3 --type=go 'type\s+Cache\b|interface\s*\{[^}]*Drain' pkg/cache internal/sms-gateway/cache || true
rg -nP -C5 --type=go 'New\("online"\)|WithTTL|TTL:' internal/sms-gateway || true

110-114: Re-queue drained items when persistence fails to prevent data loss.

If SetLastSeen fails, put items back into the cache.

-		if err := s.devicesSvc.SetLastSeen(ctx, timestamps); err != nil {
-			persistErr = fmt.Errorf("can't set last seen: %w", err)
-			s.metrics.IncrementPersistenceError()
-			return
-		}
+		if err := s.devicesSvc.SetLastSeen(ctx, timestamps); err != nil {
+			persistErr = fmt.Errorf("can't set last seen: %w", err)
+			s.metrics.IncrementPersistenceError()
+			for id, v := range items {
+				if setErr := s.cache.Set(ctx, id, v); setErr != nil {
+					s.logger.Warn("Failed to requeue last_seen", zap.String("device_id", id), zap.Error(setErr))
+				}
+			}
+			return
+		}

🧹 Nitpick comments (11)

deployments/grafana/dashboards/online.json (8)
491-501: Fix mismatch: title says 5m, query uses $__rate_interval

Use a fixed 5m window to match the title.
-          "expr": "increase(sms_online_status_set_total{status=\"error\", instance=~\"$instance\"}[$__rate_interval])",
+          "expr": "increase(sms_online_status_set_total{status=\"error\", instance=~\"$instance\"}[5m])",
78-79: Use ops/s unit for rate panels

These queries are rates; switch unit to ops/s.
-          "unit": "ops"
+          "unit": "ops/s"
173-174: Use ops/s unit for rate panels

Same here; this is a per‑second rate.
-          "unit": "ops"
+          "unit": "ops/s"
22-25: Avoid hard‑coded datasource UID; parameterize for portability

Hard‑coded UID will break across environments. Use a templated UID and remove per‑target overrides.
-      "datasource": {
-        "type": "prometheus",
-        "uid": "edqp0a73uh2bka"
-      },
+      "datasource": {
+        "type": "prometheus",
+        "uid": "${DS_PROMETHEUS}"
+      },
And for variables:
-        "datasource": {
-          "type": "prometheus",
-          "uid": "edqp0a73uh2bka"
-        },
+        "datasource": {
+          "type": "prometheus",
+          "uid": "${DS_PROMETHEUS}"
+        },
After applying, verify import works in your Grafana via Provisioning or UI import.

Also applies to: 117-120, 212-215, 311-314, 410-413, 504-507, 615-617

105-106: Remove per‑target datasource to prevent conflicts

Targets inherit the panel datasource; the mixed string “Prometheus” can cause DS resolution issues on import.
-          "datasource": "Prometheus",
Also applies to: 200-201, 299-300, 398-399, 492-493, 586-587

18-18: Drop static dashboard id and reset version

"id" is instance-specific and causes churn; version can start at 1 (Grafana will manage).
-  "id": 19,
...
-  "version": 6
+  "version": 1
Also applies to: 660-661

414-466: Set unit for error counts (optional clarity)

These panels show counts over a window. Adding a unit improves readability.
         "thresholds": {
           "mode": "absolute",
           "steps": [
             { "color": "green", "value": 0 },
             { "color": "red", "value": 0 }
           ]
-        }
+        },
+        "unit": "ops"
508-560: Set unit for error counts (optional clarity)

Same for persistence errors.
         "thresholds": {
           "mode": "absolute",
           "steps": [
             { "color": "green", "value": 0 },
             { "color": "red", "value": 0 }
           ]
-        }
+        },
+        "unit": "ops"
internal/sms-gateway/online/service.go (3)
98-106: Don’t stamp “now” on parse failures; skip or requeue invalid entries.

Using current time can distort last-seen. Prefer to skip and requeue parse failures.
-		timestamps := maps.MapValues(items, func(v string) time.Time {
-			t, err := time.Parse(time.RFC3339, v)
-			if err != nil {
-				s.logger.Warn("Can't parse last seen", zap.String("last_seen", v), zap.Error(err))
-				return time.Now().UTC()
-			}
-
-			return t
-		})
+		timestamps := make(map[string]time.Time, len(items))
+		for id, v := range items {
+			t, err := time.Parse(time.RFC3339, v)
+			if err != nil {
+				s.logger.Warn("Can't parse last seen; requeuing", zap.String("device_id", id), zap.String("last_seen", v), zap.Error(err))
+				if setErr := s.cache.Set(ctx, id, v); setErr != nil {
+					s.logger.Warn("Failed to requeue unparsable last_seen", zap.String("device_id", id), zap.Error(setErr))
+				}
+				continue
+			}
+			timestamps[id] = t
+		}
40-41: Make persist interval configurable and add small jitter to avoid herd effects.

Helps multi-node deployments.
-	ticker := time.NewTicker(1 * time.Minute)
+	interval := 1 * time.Minute // TODO: inject from config
+	jitter := time.Duration(rand.Int63n(int64(5 * time.Second))) // import math/rand
+	ticker := time.NewTicker(interval + jitter)
56-58: Consider RFC3339Nano for higher precision timestamps.

Only if consumers can parse it; otherwise keep as-is.
-	dt := time.Now().UTC().Format(time.RFC3339)
+	dt := time.Now().UTC().Format(time.RFC3339Nano)
-			t, err := time.Parse(time.RFC3339, v)
+			t, err := time.Parse(time.RFC3339Nano, v)
Also applies to: 99-101

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0f1836 and 9d371f2.

📒 Files selected for processing (3)

deployments/grafana/dashboards/online.json (1 hunks)
internal/sms-gateway/online/metrics.go (1 hunks)
internal/sms-gateway/online/service.go (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

internal/sms-gateway/online/metrics.go

🧰 Additional context used

🧬 Code graph analysis (1)

internal/sms-gateway/online/service.go (3)

internal/sms-gateway/modules/devices/service.go (1)

Service (30-39)

internal/sms-gateway/modules/auth/service.go (2)

Service (39-52)

New (54-68)

internal/sms-gateway/cache/factory.go (1)

Cache (15-15)

internal/sms-gateway/online/service.go

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (8)

deployments/grafana/dashboards/online.json (8)
22-26: Unify datasource references; avoid hardcoded UID where possible

Panels mix object UID refs and plain string "Prometheus". Pick one convention. If this dashboard is shared across envs, prefer a named datasource or a dashboard input over a hardcoded UID to avoid import failures.

Also applies to: 200-203

199-209: Disambiguate legend or aggregate by status to avoid duplicate series names

Multiple instances will render identical legends ("success"/"error") and overlay lines. Either aggregate or include instance in legend.

Apply one of the following:

Option A (aggregate across instances):
- "expr": "rate(sms_online_status_set_total{instance=~\"$instance\"}[$__rate_interval])",
- "legendFormat": "{{status}}",
+ "expr": "sum by (status) (rate(sms_online_status_set_total{instance=~\"$instance\"}[$__rate_interval]))",
+ "legendFormat": "{{status}}",
Option B (keep per-instance):
- "legendFormat": "{{status}}",
+ "legendFormat": "{{instance}} · {{status}}",
295-304: Cache ops: add aggregation to match legend and reduce series cardinality

Without aggregation, legend duplicates per instance. Sum by (operation,status) or add instance to legend.
- "expr": "rate(sms_online_cache_operations_total{instance=~\"$instance\", operation=~\"$operation\"}[$__rate_interval])",
- "legendFormat": "{{operation}} - {{status}}",
+ "expr": "sum by (operation, status) (rate(sms_online_cache_operations_total{instance=~\"$instance\", operation=~\"$operation\"}[$__rate_interval]))",
+ "legendFormat": "{{operation}} - {{status}}",
Or:
- "legendFormat": "{{operation}} - {{status}}",
+ "legendFormat": "{{instance}} · {{operation}} - {{status}}",
390-408: Latency quantiles: consider grouping by instance (or operation) for per-node SLOs

Current queries aggregate all selected instances; useful for fleet view, but hides per-instance outliers. If you want per-instance SLOs, keep instance in the sum and adjust legend.
- "expr": "histogram_quantile(0.95, sum(rate(sms_online_cache_latency_seconds_bucket{instance=~\"$instance\"}[$__rate_interval])) by (le))",
- "legendFormat": "95th Percentile",
+ "expr": "histogram_quantile(0.95, sum by (le, instance) (rate(sms_online_cache_latency_seconds_bucket{instance=~\"$instance\"}[$__rate_interval])))",
+ "legendFormat": "{{instance}} p95",
(Apply similarly for persistence latency.)

Also applies to: 498-516

565-576: Thresholds define green@0 and red@0 — everything may appear red

Set a non-zero red threshold so low error rates aren’t flagged.
- { "color": "green", "value": 0 },
- { "color": "red",   "value": 0 }
+ { "color": "green", "value": 0 },
+ { "color": "red",   "value": 1 }
Tune value to your alerting tolerance.

Also applies to: 659-670

741-760: Add datasource to the “operation” variable for consistency

The first variable specifies the Prometheus datasource; this one doesn’t. Missing datasource can break provisioning when no default is set.
 {
   "allValue": ".*",
   "allowCustomValue": false,
   "current": { "text": "All", "value": "$__all" },
+  "datasource": { "type": "prometheus", "uid": "edqp0a73uh2bka" },
   "definition": "label_values(sms_online_cache_operations_total,operation)",
   "includeAll": true,
   "name": "operation",
(Or switch both variables to a named datasource/input to avoid hardcoded UID.)

710-711: Provisioning hygiene: explicit refresh, avoid pinned id/version

Set a concrete refresh interval (e.g., "30s"); "auto" may be ambiguous across Grafana versions.

Omit fixed dashboard id and version in provisioned JSON to avoid collisions and churn.
- "refresh": "auto",
+ "refresh": "30s",
And drop:
- "id": 19,
- "version": 9
Also applies to: 18-19, 769-772

109-116: Use a clearer legend for Batch Size

“__auto” is vague. Show instance explicitly.
- "legendFormat": "__auto",
+ "legendFormat": "{{instance}}",

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d371f2 and 47c33e7.

📒 Files selected for processing (1)

deployments/grafana/dashboards/online.json (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Analyze (go)
GitHub Check: Benchmark

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

deployments/grafana/dashboards/online.json (2)
24-24: Consider using templated datasource UIDs for portability

The dashboard uses hardcoded datasource UID "edqp0a73uh2bka" in all panels. For better portability across environments, consider using templated datasource variables or consistent custom UIDs when provisioning datasources.

Apply this approach by either:

Option 1: Use templated datasource (recommended)
             "datasource": {
                 "type": "prometheus",
-                "uid": "edqp0a73uh2bka"
+                "uid": "${datasource}"
             },
And add a datasource template variable to the templating.list section.

Option 2: Use consistent custom UID in provisioning
Ensure your Prometheus datasource is provisioned with a consistent custom UID like prometheus-main across all environments.

Also applies to: 121-121, 216-216, 311-311, 420-420, 531-531, 625-625

361-363: Consider adjusting latency thresholds

The latency thresholds appear quite aggressive:

Cache latency: 0.3s warning threshold

Persistence latency: 0.3s warning, 0.5s critical

For a batch-oriented service processing device status updates, these thresholds might generate false alarms.

Consider adjusting based on your SLAs:
                         "steps": [
                             {
                                 "color": "green",
                                 "value": 0
                             },
                             {
                                 "color": "red",
-                                "value": 0.3
+                                "value": 1.0
                             }
                         ]
Also applies to: 473-475

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 47c33e7 and f9e0953.

📒 Files selected for processing (1)

deployments/grafana/dashboards/online.json (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Benchmark
GitHub Check: E2E
GitHub Check: Analyze (go)

🔇 Additional comments (5)

deployments/grafana/dashboards/online.json (5)

81-86: Consistent grid positioning supports good dashboard layout

The panels are well-positioned using Grafana's 24-column grid system with appropriate heights (8 units) and logical arrangement. The grid positioning follows Grafana best practices with proper width (w) and height (h) values.

Also applies to: 179-184, 274-279, 370-375, 482-487, 589-594, 683-688

757-769: Template variables enhance dashboard flexibility

The template variables for instance and operation filtering are well-configured, allowing users to scope the dashboard to specific instances or cache operations. This follows Grafana dashboard best practices for creating flexible, reusable dashboards.

778-780: Dashboard metadata follows conventions

The dashboard has appropriate metadata with descriptive title "SMS Online Metrics", consistent UID online_metrics, and relevant tags. This supports version control and dashboard organization best practices.

106-106: Verify Prometheus metric names and exposure

Confirm the metrics in deployments/grafana/dashboards/online.json (lines 106, 204, 299, 399, 407, 511, 519, 614, 708) are exported by the online service and that each metric's type matches its name — counters must end with _total (OpenMetrics/Prometheus).

If any counter is missing the _total suffix, rename it and add/verify a HELP string; if a non-counter uses magic suffixes (_total, _count, _sum, _bucket, _created), rename to avoid collisions and high-cardinality labels.

1-781: LGTM — dashboard metrics verified

Dashboard metrics in deployments/grafana/dashboards/online.json match the Prometheus metrics defined in internal/sms-gateway/online/metrics.go (status_set_total, batch_size, cache_operations_total, cache_latency_seconds, persistence_latency_seconds, persistence_errors_total); no changes required.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (12)

pkg/cache/memory_profile_test.go (5)

45-61: Use HeapAlloc delta (not TotalAlloc) and guard underflow.

TotalAlloc is cumulative; use heap-in-use and signed delta.

-	allocDiff := m2.TotalAlloc - m1.TotalAlloc
-	allocPerItem := float64(allocDiff) / float64(numItems)
+	heapDiff := int64(m2.HeapAlloc) - int64(m1.HeapAlloc)
+	allocPerItem := float64(heapDiff) / float64(numItems)
@@
-	t.Logf("  Total allocated: %d bytes", m2.TotalAlloc)
-	t.Logf("  Allocation difference: %d bytes", allocDiff)
+	t.Logf("  Heap alloc: %d bytes", m2.HeapAlloc)
+	t.Logf("  Heap delta: %d bytes", heapDiff)
@@
-	if allocPerItem > 300 {
+	if allocPerItem > 300 {
 		t.Errorf("Expected less than 300 bytes per item, got %.2f bytes", allocPerItem)
 	}

95-109: Cleanup check: compare heap-in-use, not TotalAlloc.

Align metric and logs; avoid unsigned wrap.

-	allocDiff := m2.TotalAlloc - m1.TotalAlloc
+	heapDiff := int64(m2.HeapAlloc) - int64(m1.HeapAlloc)
@@
-	t.Logf("  Total allocated: %d bytes", m2.TotalAlloc)
-	t.Logf("  Allocation difference: %d bytes", allocDiff)
+	t.Logf("  Heap alloc: %d bytes", m2.HeapAlloc)
+	t.Logf("  Heap delta: %d bytes", heapDiff)
@@
-	if allocDiff > 2*1024*1024 { // 2MB
-		t.Errorf("Expected less than 2MB memory growth after cleanup, got %d bytes", allocDiff)
+	if heapDiff > 2*1024*1024 { // 2MB
+		t.Errorf("Expected less than 2MB heap growth after cleanup, got %d bytes", heapDiff)

150-166: Pressure test: compare HeapAlloc delta.

Prevents false positives from cumulative allocation.

-	allocDiff := m2.TotalAlloc - m1.TotalAlloc
+	heapDiff := int64(m2.HeapAlloc) - int64(m1.HeapAlloc)
@@
-	t.Logf("  Total allocated: %d bytes", m2.TotalAlloc)
-	t.Logf("  Allocation difference: %d bytes", allocDiff)
+	t.Logf("  Heap alloc: %d bytes", m2.HeapAlloc)
+	t.Logf("  Heap delta: %d bytes", heapDiff)
@@
-	if allocDiff > expectedMaxGrowth {
-		t.Errorf("Expected less than %d bytes memory growth under pressure, got %d bytes", expectedMaxGrowth, allocDiff)
+	if heapDiff > int64(expectedMaxGrowth) {
+		t.Errorf("Expected < %d bytes heap growth under pressure, got %d bytes", expectedMaxGrowth, heapDiff)

243-245: Handle Drain error in leak test.

Don’t drop possible errors.

-		tempCache.Drain(ctx)
+		if _, err := tempCache.Drain(ctx); err != nil {
+			t.Errorf("Drain failed: %v", err)
+		}

25-39: Fix: invalid range over integers (won’t compile).

Convert all range-over-int loops to classic indexed loops.

-	for i := range numItems {
+	for i := 0; i < numItems; i++ {

-	for i := range numItems {
+	for i := 0; i < numItems; i++ {

-	for i := range numItems {
+	for i := 0; i < numItems; i++ {

-	for range numGCs {
+	for i := 0; i < numGCs; i++ {

-		for j := range 100 {
+		for j := 0; j < 100; j++ {

-	for i := range numItems {
+	for i := 0; i < numItems; i++ {

-	for i := range numCaches {
+	for i := 0; i < numCaches; i++ {

-		for j := range 10 {
+		for j := 0; j < 10; j++ {

Also applies to: 75-84, 175-184, 187-199, 201-215, 228-248

pkg/cache/memory_edge_test.go (1)

232-249: Fix: invalid range over integers (won’t compile).

Replace range-over-int with classic for-loops.
-	for i := range numOperations {
+	for i := 0; i < numOperations; i++ {
-	for i := range numGoroutines {
+	for i := 0; i < numGoroutines; i++ {
Also applies to: 373-389

.github/workflows/go.yml (2)

70-74: Codecov on forks: guard missing secret or switch to tokenless.

Unconditional use of CODECOV_TOKEN will fail for forked PRs.

Option A — guard by secret and event:

-      - name: Upload coverage to Codecov
-        uses: codecov/codecov-action@v5
-        with:
-          token: ${{ secrets.CODECOV_TOKEN }}
+      - name: Upload coverage to Codecov
+        if: >
+          ${{
+            (github.event_name != 'pull_request' || github.event.pull_request.head.repo.fork == false)
+            && secrets.CODECOV_TOKEN != ''
+          }}
+        uses: codecov/codecov-action@v5
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}
+          fail_ci_if_error: false

Option B — tokenless (public repos):

-      - name: Upload coverage to Codecov
-        uses: codecov/codecov-action@v5
-        with:
-          token: ${{ secrets.CODECOV_TOKEN }}
+      - name: Upload coverage to Codecov
+        uses: codecov/codecov-action@v5
+        with:
+          fail_ci_if_error: false

75-81: Benchmark baselines not persisted; fail-on-alert risks flakes.

You’re reading an external JSON but never persisting updates. Also commenting/push needs proper perms.

Option A — persist to gh-pages (recommended):

   benchmark:
     name: Benchmark
     runs-on: ubuntu-latest
     permissions:
-      contents: read
+      contents: write
       pull-requests: write
@@
       - name: Upload benchmark results
         uses: benchmark-action/github-action-benchmark@v1
         with:
           tool: "go"
           output-file-path: benchmark.txt
           external-data-json-path: ./cache/benchmark-data.json
-          fail-on-alert: true
+          auto-push: true
+          comment-on-alert: true
+          fail-on-alert: true

Option B — no persistence: disable alerting to avoid false failures:

           external-data-json-path: ./cache/benchmark-data.json
-          fail-on-alert: true
+          comment-on-alert: false
+          fail-on-alert: false

Also ensure the external path exists:

       - name: Install all Go dependencies
         run: go mod download
+
+      - name: Ensure benchmark cache dir exists
+        run: mkdir -p cache

Also applies to: 100-111

internal/sms-gateway/online/service.go (4)

28-36: Name the logger for this module.

Consistent scoping helps log filtering.

 func New(devicesSvc *devices.Service, cache cache.Cache, logger *zap.Logger, metrics *metrics) Service {
 	return &service{
 		devicesSvc: devicesSvc,
 
 		cache: cache,
 
-		logger:  logger,
+		logger:  logger.Named("online"),
 		metrics: metrics,
 	}
 }

39-46: Flush once on shutdown to avoid dropping buffered updates.

Do a best-effort final persist with a short timeout.

 	for {
 		select {
 		case <-ctx.Done():
-			return
+			s.logger.Debug("Shutting down; flushing online status")
+			flushCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+			defer cancel()
+			if err := s.persist(flushCtx); err != nil {
+				s.logger.Warn("Final persist failed", zap.Error(err))
+			}
+			return

56-67: Guard against empty deviceID to avoid empty-key writes.

Skip and warn early.

 func (s *service) SetOnline(ctx context.Context, deviceID string) {
+	if deviceID == "" {
+		s.logger.Warn("Empty device_id; skipping SetOnline")
+		return
+	}
 	dt := time.Now().UTC().Format(time.RFC3339)

82-117: Re-queue drained items on persist failure to prevent data loss.

Drain clears the buffer; on DB error, best-effort reinsert.

 		items, err := s.cache.Drain(ctx)
@@
-		if err := s.devicesSvc.SetLastSeen(ctx, timestamps); err != nil {
+		if err := s.devicesSvc.SetLastSeen(ctx, timestamps); err != nil {
 			persistErr = fmt.Errorf("can't set last seen: %w", err)
 			s.metrics.IncrementPersistenceError()
+			// Best-effort requeue
+			requeueCtx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
+			defer cancel()
+			for id, v := range items {
+				if setErr := s.cache.Set(requeueCtx, id, v); setErr != nil {
+					s.logger.Warn("Failed to requeue last_seen", zap.String("device_id", id), zap.Error(setErr))
+				}
+			}
 			return
 		}

🧹 Nitpick comments (13)

pkg/cache/memory_edge_test.go (3)
47-56: Harden expiry timing to reduce flakiness.

1ms TTL with 2ms sleep is tight on CI; bump margins.
-	ttl := 1 * time.Millisecond
+	ttl := 10 * time.Millisecond
@@
-	time.Sleep(2 * ttl)
+	time.Sleep(5 * ttl)
254-257: Loosen duration bound (avoid timing flakes).

Increase tolerance for RapidOperations or this will flap on busy runners.
-	if durationTaken > 2*duration {
+	if durationTaken > 5*duration {
 		t.Errorf("Operations took too long: %v", durationTaken)
 	}
16-37: Avoid shadowing the imported package name cache.

Renaming local vars improves readability and prevents accidental package masking.
-	cache := cache.NewMemory(0)
+	c := cache.NewMemory(0)
@@
-	err := cache.Set(ctx, key, value)
+	err := c.Set(ctx, key, value)
@@
-	retrieved, err := cache.Get(ctx, key)
+	retrieved, err := c.Get(ctx, key)
(Apply similarly in the other functions in this file.)

Also applies to: 63-82, 84-104, 260-287, 330-352
pkg/cache/memory_profile_test.go (2)
15-17: Avoid shadowing the imported package cache.

Rename locals for clarity.
-	cache := cache.NewMemory(0)
+	c := cache.NewMemory(0)
@@
-	err := cache.Set(ctx, key, value)
+	err := c.Set(ctx, key, value)
@@
-	_, err = cache.Get(ctx, key)
+	_, err = c.Get(ctx, key)
(Apply similarly where the local is named cache.)

Also applies to: 65-67, 170-172, 275-277

58-60: Thresholds may be too strict across Go/GC versions.

Consider widening or logging-only to avoid flaky failures.

Also applies to: 162-165
.github/workflows/go.yml (2)
35-39: Enable Go module caching to speed CI.

setup-go v5 supports built-in caching.
       - name: Set up Go
         uses: actions/setup-go@v5
         with:
           go-version: stable
+          cache: true
(Apply in Lint, Test, and Benchmark jobs.)

Also applies to: 56-60, 87-91

41-45: Pin golangci-lint version; avoid “latest” for reproducibility.

Floating linter versions can break CI unexpectedly.

Example:
       - name: Run golangci-lint
         uses: golangci/golangci-lint-action@v8
         with:
-          version: latest
+          version: v1.59.1   # pin to a known-good version
           args: --timeout=5m
internal/sms-gateway/cache/factory.go (3)
3-9: Default TTL for "online" cache + minor robustness (lowercase scheme).

Avoid non-expiring “online” entries and normalize scheme. Apply:
 import (
 	"fmt"
 	"net/url"
+	"strings"
+	"time"

 	"github.com/android-sms-gateway/core/redis"
 	"github.com/android-sms-gateway/server/pkg/cache"
 )
 
 const (
 	keyPrefix = "sms-gateway:"
+	onlineDefaultTTL = 15 * time.Minute
 )
 
@@
-	switch u.Scheme {
+	switch strings.ToLower(u.Scheme) {
 	case "memory":
 		return &factory{
 			new: func(name string) (cache.Cache, error) {
-				return cache.NewMemory(0), nil
+				ttl := time.Duration(0)
+				if name == keyPrefix+"online" || strings.HasSuffix(name, ":online") {
+					ttl = onlineDefaultTTL
+				}
+				return cache.NewMemory(ttl), nil
 			},
 		}, nil
 	case "redis":
 		client, err := redis.New(redis.Config{URL: config.URL})
@@
 		return &factory{
 			new: func(name string) (cache.Cache, error) {
-				return cache.NewRedis(client, name, 0), nil
+				ttl := time.Duration(0)
+				if name == keyPrefix+"online" || strings.HasSuffix(name, ":online") {
+					ttl = onlineDefaultTTL
+				}
+				return cache.NewRedis(client, name, ttl), nil
 			},
 		}, nil
Ensure the “online” cache is instantiated via factory.New("online") (it is in module wiring). If other names should expire too, extend the predicate accordingly.

Also applies to: 11-13, 35-51, 57-60

15-23: Use the local alias consistently.

Prefer the exported alias to reduce verbosity.
-type Cache = cache.Cache
+type Cache = cache.Cache
 
 type Factory interface {
-	New(name string) (cache.Cache, error)
+	New(name string) (Cache, error)
 }
 
 type factory struct {
-	new func(name string) (cache.Cache, error)
+	new func(name string) (Cache, error)
 }
35-51: Optional: memoize memory caches by name.

Multiple New("online") calls return separate in‑memory stores; consider memoizing per name for parity with Redis namespacing.

If desired, I can draft a small sync.Map-based memoization inside factory.
internal/sms-gateway/modules/devices/service.go (1)
104-124: Batch to DB to cut round-trips; keep per-item fallback as a guard.

Current loop issues one query per device. If the repository can handle bulk updates, prefer a single batch call.
-func (s *Service) SetLastSeen(ctx context.Context, batch map[string]time.Time) error {
+func (s *Service) SetLastSeen(ctx context.Context, batch map[string]time.Time) error {
 	if len(batch) == 0 {
 		return nil
 	}
 
-	var multiErr error
-	for deviceID, lastSeen := range batch {
-		if err := ctx.Err(); err != nil {
-			return errors.Join(err, multiErr)
-		}
-		if err := s.devices.SetLastSeen(ctx, deviceID, lastSeen); err != nil {
-			multiErr = errors.Join(multiErr, fmt.Errorf("device %s: %w", deviceID, err))
-			s.logger.Error("can't set last seen",
-				zap.String("device_id", deviceID),
-				zap.Time("last_seen", lastSeen),
-				zap.Error(err),
-			)
-		}
-	}
-	return multiErr
+	// Prefer a repository-level bulk update when available.
+	if bulkErr := s.devices.SetLastSeenBatch(ctx, batch); bulkErr == nil {
+		return nil
+	}
+	// Fallback: per-item updates.
+	var multiErr error
+	for deviceID, lastSeen := range batch {
+		if err := ctx.Err(); err != nil {
+			return errors.Join(multiErr, err)
+		}
+		if err := s.devices.SetLastSeen(ctx, deviceID, lastSeen); err != nil {
+			multiErr = errors.Join(multiErr, fmt.Errorf("device %s: %w", deviceID, err))
+			s.logger.Error("can't set last seen", zap.String("device_id", deviceID), zap.Time("last_seen", lastSeen), zap.Error(err))
+		}
+	}
+	return multiErr
 }
If SetLastSeenBatch doesn’t exist yet, I can sketch the repo SQL for your DB dialect.
internal/sms-gateway/online/service.go (2)

40-41: Make the persist interval configurable.

Hard-coding 1m limits ops flexibility; read from module config with a sane default.

I can wire a Config{PersistInterval time.Duration} into fx if you want.

56-65: Set a short TTL on cache writes (use cache.WithTTL)

pkg/cache exposes WithTTL — set a short TTL (e.g., 15time.Minute) when calling Set to self‑heal on crashes. Example replacement:
if err = s.cache.Set(ctx, deviceID, dt, cache.WithTTL(15time.Minute)); err != nil { ... }

Location: internal/sms-gateway/online/service.go (around lines 56–65)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f9e0953 and 6f1dc1f.

📒 Files selected for processing (22)

.github/workflows/docker-publish.yml (0 hunks)
.github/workflows/go.yml (1 hunks)
.github/workflows/golangci-lint.yml (0 hunks)
configs/config.example.yml (1 hunks)
deployments/grafana/dashboards/online.json (1 hunks)
deployments/prometheus/alerts/online-alerts.yml (1 hunks)
internal/sms-gateway/app.go (4 hunks)
internal/sms-gateway/cache/factory.go (1 hunks)
internal/sms-gateway/models/migrations/mysql/20250918231606_add_last_seen_index.sql (1 hunks)
internal/sms-gateway/models/models.go (1 hunks)
internal/sms-gateway/modules/auth/service.go (5 hunks)
internal/sms-gateway/modules/devices/repository.go (2 hunks)
internal/sms-gateway/modules/devices/service.go (3 hunks)
internal/sms-gateway/online/metrics.go (1 hunks)
internal/sms-gateway/online/module.go (1 hunks)
internal/sms-gateway/online/service.go (1 hunks)
pkg/cache/memory.go (1 hunks)
pkg/cache/memory_bench_test.go (1 hunks)
pkg/cache/memory_concurrency_test.go (1 hunks)
pkg/cache/memory_edge_test.go (1 hunks)
pkg/cache/memory_profile_test.go (1 hunks)
pkg/cache/redis.go (1 hunks)

💤 Files with no reviewable changes (2)

.github/workflows/docker-publish.yml
.github/workflows/golangci-lint.yml

🚧 Files skipped from review as they are similar to previous changes (14)

internal/sms-gateway/app.go
pkg/cache/memory_concurrency_test.go
internal/sms-gateway/online/module.go
pkg/cache/redis.go
deployments/prometheus/alerts/online-alerts.yml
internal/sms-gateway/modules/auth/service.go
internal/sms-gateway/modules/devices/repository.go
internal/sms-gateway/models/models.go
deployments/grafana/dashboards/online.json
pkg/cache/memory.go
internal/sms-gateway/models/migrations/mysql/20250918231606_add_last_seen_index.sql
pkg/cache/memory_bench_test.go
configs/config.example.yml
internal/sms-gateway/online/metrics.go

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-09-10T11:11:43.462Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/memory.go:59-74
Timestamp: 2025-09-10T11:11:43.462Z
Learning: In pkg/cache/memory.go, the cleanup method removes expired items first, then executes the provided callback function. This means methods like Drain that use cleanup don't need additional expiration filtering since expired items are already removed before the callback runs.

Applied to files:

pkg/cache/memory_profile_test.go
pkg/cache/memory_edge_test.go

📚 Learning: 2025-09-10T23:53:42.006Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/redis.go:63-75
Timestamp: 2025-09-10T23:53:42.006Z
Learning: In pkg/cache/redis.go, the Redis cache implementation uses Redis's native field-level TTL via HExpireAt, meaning expired fields are automatically removed by Redis itself. This differs from the memory implementation where the application tracks expiration times. As a result, operations like Drain that call HGetAll will only return non-expired items since expired fields are already removed by Redis.

Applied to files:

pkg/cache/memory_edge_test.go

🧬 Code graph analysis (5)

internal/sms-gateway/online/service.go (3)

internal/sms-gateway/modules/devices/service.go (1)

Service (30-39)

internal/sms-gateway/modules/auth/service.go (2)

Service (39-52)

New (54-68)

internal/sms-gateway/cache/factory.go (1)

Cache (15-15)

pkg/cache/memory_profile_test.go (1)

pkg/cache/memory.go (1)

NewMemory (16-23)

internal/sms-gateway/modules/devices/service.go (2)

internal/sms-gateway/modules/events/service.go (3)

s (71-116)

deviceSvc (13-24)

s (40-57)

internal/sms-gateway/handlers/mobile.go (1)

h (149-165)

internal/sms-gateway/cache/factory.go (5)

pkg/cache/cache.go (1)

Cache (5-36)

internal/config/config.go (2)

Cache (74-76)

Config (10-18)

internal/sms-gateway/cache/config.go (1)

Config (4-6)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/redis.go (1)

NewRedis (44-56)

pkg/cache/memory_edge_test.go (3)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/options.go (1)

WithTTL (22-30)

pkg/cache/errors.go (2)

ErrKeyExpired (9-9)

ErrKeyNotFound (7-7)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Benchmark
GitHub Check: E2E
GitHub Check: Analyze (go)

🔇 Additional comments (3)

pkg/cache/memory_edge_test.go (1)

289-329: LGTM: Drain semantics validated correctly.

Test matches memory implementation: expired items are removed prior to Drain callback, so only non‑expired entries are returned and expired keys become “not found”.
.github/workflows/go.yml (1)
10-19: Fix glob: ".go" won’t match files in subdirs; use "/*.go".

Current patterns may skip CI on most Go changes.
   push:
     branches: [master]
     paths:
-      - "**.go"
+      - "**/*.go"
       - "go.mod"
       - "go.sum"
   pull_request:
     branches: [master]
     paths:
-      - "**.go"
+      - "**/*.go"
       - "go.mod"
       - "go.sum"
Likely an incorrect or invalid review comment.
internal/sms-gateway/modules/devices/service.go (1)

136-144: Token cache invalidation now matches GetByToken hashing. LGTM.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (13)

go.mod (1)
3-3: Pin CI to Go 1.24.1 via go-version-file to avoid drift.

Your workflow uses setup-go: stable. Safer to bind CI to the version declared here (1.24.1) to prevent unexpected toolchain changes.

Apply in .github/workflows/go.yml for each setup-go step:
-        with:
-          go-version: stable
+        with:
+          go-version-file: go.mod
.github/workflows/go.yml (5)
11-13: Fix path filters to match all Go files.

Use "/*.go" instead of ".go" for recursive matches.
-      - "**.go"
+      - "**/*.go"
Also applies to: 17-19

35-39: Bind Go toolchain to go.mod.

Use go-version-file to ensure CI uses Go 1.24.1 as in go.mod.
-        with:
-          go-version: stable
+        with:
+          go-version-file: go.mod
56-60: Same: bind toolchain to go.mod in test job.
-        with:
-          go-version: stable
+        with:
+          go-version-file: go.mod
87-91: Same: bind toolchain to go.mod in benchmark job.
-        with:
-          go-version: stable
+        with:
+          go-version-file: go.mod
97-99: Run only benchmarks (skip tests) for speed/stability.
-        run: go test -bench=. -benchmem ./... | tee benchmark.txt
+        run: go test -run='^$$' -bench=. -benchmem ./... | tee benchmark.txt
internal/sms-gateway/modules/devices/repository.go (1)
74-88: last_seen update guards look correct; consider optional NULL-tolerant condition.

If schema ever allows NULLs, add “OR last_seen IS NULL” for robustness. Current model sets NOT NULL, so this is optional.
-        Where("id = ? AND last_seen < ?", id, lastSeen).
+        Where("id = ? AND (last_seen IS NULL OR last_seen < ?)", id, lastSeen).
pkg/cache/redis.go (2)
58-61: Document Cleanup as a no-op for Redis (TTL handled server-side).

Add a brief comment to avoid confusion vs memory impl that actively prunes.
 // Cleanup implements Cache.
+// Redis handles field expiration natively; nothing to do here.
 func (r *redisCache) Cleanup(_ context.Context) error {
130-137: Optional: use TxPipeline for fewer RTTs (no behavior change).

Tiny improvement; current Pipelined is fine.
-    _, err := r.client.Pipelined(ctx, func(p redis.Pipeliner) error {
+    p := r.client.TxPipeline()
+    {
         p.HSet(ctx, r.key, key, value)
         if !options.validUntil.IsZero() {
             p.HExpireAt(ctx, r.key, options.validUntil, key)
         }
-        return nil
-    })
-    if err != nil {
+    }
+    if _, err := p.Exec(ctx); err != nil {
         return fmt.Errorf("can't set cache item: %w", err)
     }
pkg/cache/memory_edge_test.go (1)

221-258: Time-based perf assertion may be flaky in CI.

Either increase the threshold substantially or convert to a benchmark. Current 2x of 100ms is tight on shared runners.
internal/sms-gateway/cache/factory.go (2)
3-9: Normalize URL scheme; accept mixed‑case and future-proof imports.

Switch to strings.ToLower(u.Scheme) and add the import. Prevents failures on "REDIS://..." etc.
 import (
 	"fmt"
 	"net/url"
+	"strings"

 	"github.com/android-sms-gateway/core/redis"
 	"github.com/android-sms-gateway/server/pkg/cache"
 )
@@
-	switch u.Scheme {
+	switch strings.ToLower(u.Scheme) {
 	case "memory":
 		return &factory{
 			new: func(name string) (Cache, error) {
 				return cache.NewMemory(0), nil
 			},
 		}, nil
 	case "redis":
Also applies to: 35-54

58-60: Guard against empty cache names.

Avoid creating a root-scoped prefix ("sms-gateway:") if callers pass an empty name.
 func (f *factory) New(name string) (Cache, error) {
-	return f.new(keyPrefix + name)
+	n := keyPrefix + name
+	if strings.TrimSpace(name) == "" {
+		return nil, fmt.Errorf("empty cache name")
+	}
+	return f.new(n)
 }
pkg/cache/memory_profile_test.go (1)

1-1: Drop //nolint:errcheck; fix missed checks instead.

The file already checks most errors; the pragma hides real misses (e.g., Drain above). Remove it once fixes are applied.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 02145a2 and e149f8e.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (33)

.github/workflows/docker-publish.yml (0 hunks)
.github/workflows/go.yml (1 hunks)
.github/workflows/golangci-lint.yml (0 hunks)
build/package/Dockerfile (1 hunks)
build/package/Dockerfile.dev (1 hunks)
configs/config.example.yml (1 hunks)
deployments/grafana/dashboards/online.json (1 hunks)
deployments/prometheus/alerts/online-alerts.yml (1 hunks)
go.mod (4 hunks)
internal/config/config.go (3 hunks)
internal/config/module.go (2 hunks)
internal/sms-gateway/app.go (4 hunks)
internal/sms-gateway/cache/config.go (1 hunks)
internal/sms-gateway/cache/factory.go (1 hunks)
internal/sms-gateway/cache/module.go (1 hunks)
internal/sms-gateway/models/migrations/mysql/20250918231606_add_last_seen_index.sql (1 hunks)
internal/sms-gateway/models/models.go (1 hunks)
internal/sms-gateway/modules/auth/service.go (5 hunks)
internal/sms-gateway/modules/devices/repository.go (2 hunks)
internal/sms-gateway/modules/devices/service.go (3 hunks)
internal/sms-gateway/online/metrics.go (1 hunks)
internal/sms-gateway/online/module.go (1 hunks)
internal/sms-gateway/online/service.go (1 hunks)
pkg/cache/cache.go (1 hunks)
pkg/cache/errors.go (1 hunks)
pkg/cache/memory.go (1 hunks)
pkg/cache/memory_bench_test.go (1 hunks)
pkg/cache/memory_concurrency_test.go (1 hunks)
pkg/cache/memory_edge_test.go (1 hunks)
pkg/cache/memory_profile_test.go (1 hunks)
pkg/cache/memory_test.go (1 hunks)
pkg/cache/options.go (1 hunks)
pkg/cache/redis.go (1 hunks)

💤 Files with no reviewable changes (2)

.github/workflows/golangci-lint.yml
.github/workflows/docker-publish.yml

🚧 Files skipped from review as they are similar to previous changes (22)

internal/config/config.go
internal/sms-gateway/models/migrations/mysql/20250918231606_add_last_seen_index.sql
build/package/Dockerfile.dev
internal/sms-gateway/online/module.go
pkg/cache/memory.go
pkg/cache/cache.go
configs/config.example.yml
internal/sms-gateway/modules/devices/service.go
pkg/cache/memory_concurrency_test.go
internal/sms-gateway/online/metrics.go
pkg/cache/errors.go
internal/sms-gateway/cache/module.go
build/package/Dockerfile
internal/sms-gateway/modules/auth/service.go
internal/sms-gateway/app.go
internal/config/module.go
deployments/grafana/dashboards/online.json
pkg/cache/options.go
internal/sms-gateway/cache/config.go
pkg/cache/memory_test.go
deployments/prometheus/alerts/online-alerts.yml
internal/sms-gateway/models/models.go

🧰 Additional context used

🧠 Learnings (3)

📚 Learning: 2025-09-10T23:53:42.006Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/redis.go:63-75
Timestamp: 2025-09-10T23:53:42.006Z
Learning: In pkg/cache/redis.go, the Redis cache implementation uses Redis's native field-level TTL via HExpireAt, meaning expired fields are automatically removed by Redis itself. This differs from the memory implementation where the application tracks expiration times. As a result, operations like Drain that call HGetAll will only return non-expired items since expired fields are already removed by Redis.

Applied to files:

pkg/cache/memory_edge_test.go
pkg/cache/redis.go

📚 Learning: 2025-09-10T11:11:43.462Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/memory.go:59-74
Timestamp: 2025-09-10T11:11:43.462Z
Learning: In pkg/cache/memory.go, the cleanup method removes expired items first, then executes the provided callback function. This means methods like Drain that use cleanup don't need additional expiration filtering since expired items are already removed before the callback runs.

Applied to files:

pkg/cache/memory_edge_test.go
pkg/cache/redis.go
pkg/cache/memory_profile_test.go
pkg/cache/memory_bench_test.go

📚 Learning: 2025-09-10T23:54:16.828Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/redis.go:118-121
Timestamp: 2025-09-10T23:54:16.828Z
Learning: Redis 7.4.0 introduced HEXPIRE, HEXPIREAT, and related commands that support per-field TTL on hash data structures. go-redis v9.6.0+ includes support for these commands with methods like HExpireAt.

Applied to files:

pkg/cache/redis.go

🧬 Code graph analysis (7)

pkg/cache/memory_edge_test.go (3)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/options.go (1)

WithTTL (22-30)

pkg/cache/errors.go (2)

ErrKeyExpired (9-9)

ErrKeyNotFound (7-7)

internal/sms-gateway/modules/devices/repository.go (2)

internal/sms-gateway/models/models.go (1)

Device (25-36)

internal/sms-gateway/handlers/mobile.go (1)

h (149-165)

internal/sms-gateway/online/service.go (3)

internal/sms-gateway/modules/devices/service.go (1)

Service (30-39)

internal/sms-gateway/modules/auth/service.go (2)

Service (39-52)

New (54-68)

internal/sms-gateway/cache/factory.go (1)

Cache (15-15)

pkg/cache/redis.go (3)

pkg/cache/cache.go (1)

Cache (5-36)

pkg/cache/errors.go (2)

ErrKeyNotFound (7-7)

ErrKeyExists (11-11)

pkg/cache/options.go (1)

Option (6-6)

pkg/cache/memory_profile_test.go (1)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/memory_bench_test.go (2)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/options.go (1)

WithTTL (22-30)

internal/sms-gateway/cache/factory.go (5)

internal/config/config.go (2)

Cache (74-76)

Config (10-18)

pkg/cache/cache.go (1)

Cache (5-36)

internal/sms-gateway/cache/config.go (1)

Config (4-6)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/redis.go (1)

NewRedis (44-56)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: E2E
GitHub Check: Benchmark
GitHub Check: Analyze (go)

🔇 Additional comments (18)

go.mod (1)

21-21: Redis client OK; ensure server >= 7.4 for HEXPIRE/HEXPIREAT.

Code relies on HExpireAt. Please verify runtime Redis is 7.4+ and document this requirement (README/helm/chart).
.github/workflows/go.yml (1)
70-74: Guard Codecov upload on forks.
-      - name: Upload coverage to Codecov
-        uses: codecov/codecov-action@v5
-        with:
-          token: ${{ secrets.CODECOV_TOKEN }}
+      - name: Upload coverage to Codecov
+        if: ${{ secrets.CODECOV_TOKEN != '' }}
+        uses: codecov/codecov-action@v5
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}
+          fail_ci_if_error: false
internal/sms-gateway/modules/devices/repository.go (2)

70-72: Good fix: parameterized WHERE for push_token.

99-106: Confirm soft-delete intent and index usage.

Delete respects GORM soft delete if SoftDeletableModel is present—confirm that’s desired. Also ensure an index on last_seen exists (models shows it), since this is a range delete.
pkg/cache/redis.go (3)
72-92: Drain is atomic via Lua — LGTM.

145-168: Bug: HExpireAt called without field — wrong arity at runtime.

This sets no per-field TTL and will error. Pass the field name.
-    if !options.validUntil.IsZero() {
-        if err := r.client.HExpireAt(ctx, r.key, options.validUntil).Err(); err != nil {
+    if !options.validUntil.IsZero() {
+        if err := r.client.HExpireAt(ctx, r.key, options.validUntil, key).Err(); err != nil {
             return fmt.Errorf("can't set cache item ttl: %w", err)
         }
     }
94-106: Critical: redis.Get violates Cache.Get contract — returns ErrKeyNotFound for expired keys (should return ErrKeyExpired).

Redis removes expired hash fields (field-level TTL / HExpireAt), so HGet cannot distinguish "expired" vs "never existed". Either:

Update Cache interface/docs/tests to accept ErrKeyNotFound for expired items on the Redis backend; or

Add expiry metadata (side hash or zset with expires_at) and change pkg/cache/redis.go Get to consult it and return ErrKeyExpired when appropriate.

Location: pkg/cache/redis.go (Get).
pkg/cache/memory_edge_test.go (2)
373-389: Same compile error in goroutine launcher.
-    for i := range numGoroutines {
+    for i := 0; i < numGoroutines; i++ {
232-249: Fix compile error: range over int.
-    for i := range numOperations {
+    for i := 0; i < numOperations; i++ {
internal/sms-gateway/online/service.go (5)
28-36: Name the logger ("online").
 func New(devicesSvc *devices.Service, cache cache.Cache, logger *zap.Logger, metrics *metrics) Service {
 	return &service{
 		devicesSvc: devicesSvc,
 
 		cache: cache,
 
-		logger:  logger,
+		logger:  logger.Named("online"),
 		metrics: metrics,
 	}
 }
3-13: Validate deviceID early; drop empty/whitespace IDs.
 import (
 	"context"
 	"fmt"
 	"time"
+	"strings"
@@
 func (s *service) SetOnline(ctx context.Context, deviceID string) {
+	if strings.TrimSpace(deviceID) == "" {
+		s.logger.Warn("empty device_id; skipping SetOnline")
+		return
+	}
 	dt := time.Now().UTC().Format(time.RFC3339)
Also applies to: 56-57

61-69: Set per-item TTL for “online” updates (prevents unbounded retention).
 	s.metrics.ObserveCacheLatency(func() {
-		if err = s.cache.Set(ctx, deviceID, dt); err != nil {
+		if err = s.cache.Set(ctx, deviceID, dt, cache.WithTTL(15*time.Minute)); err != nil {
 			s.metrics.IncrementCacheOperation(operationSet, statusError)
 			s.logger.Error("Can't set online status", zap.String("device_id", deviceID), zap.Error(err))
 			s.metrics.IncrementStatusSet(false)
 		}
 	})
Note: ensure pkg/cache/options.go WithTTL semantics are correct (set validUntil only when ttl > 0). See separate comment.

39-54: Flush once on shutdown to avoid losing in-memory updates.
 	for {
 		select {
 		case <-ctx.Done():
-			return
+			s.logger.Debug("Shutting down; flushing online status")
+			flushCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+			defer cancel()
+			if err := s.persist(flushCtx); err != nil {
+				s.logger.Warn("Final persist failed", zap.Error(err))
+			}
+			return
110-116: Re-queue drained items on persist failure (best-effort).
-		if err := s.devicesSvc.SetLastSeen(ctx, timestamps); err != nil {
-			persistErr = fmt.Errorf("can't set last seen: %w", err)
-			s.metrics.IncrementPersistenceError()
-			return
-		}
+		if err := s.devicesSvc.SetLastSeen(ctx, timestamps); err != nil {
+			persistErr = fmt.Errorf("can't set last seen: %w", err)
+			s.metrics.IncrementPersistenceError()
+			// best-effort requeue
+			for id, v := range items {
+				if setErr := s.cache.Set(ctx, id, v); setErr != nil {
+					s.logger.Warn("Failed to requeue last_seen", zap.String("device_id", id), zap.Error(setErr))
+				}
+			}
+			return
+		}
pkg/cache/memory_profile_test.go (2)
45-55: Use HeapAlloc delta, not TotalAlloc (avoids false positives).
-	allocDiff := m2.TotalAlloc - m1.TotalAlloc
-	allocPerItem := float64(allocDiff) / float64(numItems)
+	allocDiff := m2.HeapAlloc - m1.HeapAlloc
+	allocPerItem := float64(allocDiff) / float64(numItems)
@@
-	t.Logf("  Total allocated: %d bytes", m2.TotalAlloc)
-	t.Logf("  Allocation difference: %d bytes", allocDiff)
+	t.Logf("  Heap alloc: %d bytes", m2.HeapAlloc)
+	t.Logf("  Heap delta: %d bytes", allocDiff)
@@
-	allocDiff := m2.TotalAlloc - m1.TotalAlloc
+	allocDiff := m2.HeapAlloc - m1.HeapAlloc
@@
-	t.Logf("  Total allocated: %d bytes", m2.TotalAlloc)
-	t.Logf("  Allocation difference: %d bytes", allocDiff)
+	t.Logf("  Heap alloc: %d bytes", m2.HeapAlloc)
+	t.Logf("  Heap delta: %d bytes", allocDiff)
@@
-	if allocDiff > expectedMaxGrowth {
-		t.Errorf("Expected less than %d bytes memory growth under pressure, got %d bytes", expectedMaxGrowth, allocDiff)
+	if allocDiff > expectedMaxGrowth {
+		t.Errorf("Expected < %d bytes heap growth under pressure, got %d bytes", expectedMaxGrowth, allocDiff)
Also applies to: 95-109, 150-166

244-245: Handle Drain error (don’t ignore).
-	// Clear the cache
-	tempCache.Drain(ctx)
+	// Clear the cache
+	if _, err := tempCache.Drain(ctx); err != nil {
+		t.Errorf("Drain failed: %v", err)
+	}
pkg/cache/memory_bench_test.go (2)

1-1: Stop silencing errors; add minimal checks in hot paths.

Remove //nolint:errcheck and check errors for Set/Get/Delete/Cleanup/Drain in each benchmark loop. Fail fast with b.Fatal/b.Fatalf on unexpected errors. For SetOrFail, ignore cache.ErrKeyExists only.

If helpful, I can push a patch covering all benches consistently.

Also applies to: 22-27, 37-45, 55-65, 74-80, 95-99, 106-118, 132-137, 152-157, 181-189, 208-221, 246-263, 286-292, 299-304, 324-336, 364-375, 413-417, 438-445, 455-460, 465-470

316-334: Verify WithTTL semantics before benchmarking TTL overhead.

pkg/cache/options.go shows WithTTL setting validUntil even when ttl <= 0. Ensure it is:

ttl > 0: o.validUntil = time.Now().Add(ttl)

else: o.validUntil = time.Time{}

Also applies to: 12-13

.github/workflows/go.yml

internal/sms-gateway/online/service.go

pkg/cache/memory_bench_test.go

pkg/cache/memory_profile_test.go

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

pkg/cache/memory_edge_test.go (2)
54-61: Reduce timing flakiness in TTL tests.

Sleep-based expiry checks can be flaky on contended runners. Consider polling until deadline or adding slack (e.g., wait TTL + jitter and assert within a time-bound loop).

Also applies to: 182-189

16-17: Avoid shadowing the imported package name.

Local var named “cache” shadows the imported package alias “cache”, which can confuse readers. Prefer “c” for the instance.
-	cache := cache.NewMemory(0)
+	c := cache.NewMemory(0)
pkg/cache/redis.go (1)
58-61: Document no-op Cleanup for Redis.

Clarify that Redis enforces TTL itself.

Apply this diff:
-// Cleanup implements Cache.
+// Cleanup implements Cache.
+// No-op for Redis: field-level TTL is enforced by Redis.
internal/sms-gateway/cache/factory.go (1)
35-54: Support rediss scheme (TLS).

Allow secure Redis URLs.

Apply this diff:
-	switch u.Scheme {
+	switch u.Scheme {
 	case "memory":
 		...
-	case "redis":
+	case "redis", "rediss":

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d76e09 and 2da81b9.

📒 Files selected for processing (13)

.github/workflows/docker-publish.yml (0 hunks)
.github/workflows/go.yml (1 hunks)
.github/workflows/golangci-lint.yml (0 hunks)
internal/sms-gateway/cache/factory.go (1 hunks)
internal/sms-gateway/models/migrations/mysql/20250918231606_add_last_seen_index.sql (1 hunks)
internal/sms-gateway/models/models.go (1 hunks)
internal/sms-gateway/modules/devices/repository.go (2 hunks)
internal/sms-gateway/modules/devices/service.go (3 hunks)
pkg/cache/memory.go (1 hunks)
pkg/cache/memory_bench_test.go (1 hunks)
pkg/cache/memory_edge_test.go (1 hunks)
pkg/cache/memory_profile_test.go (1 hunks)
pkg/cache/redis.go (1 hunks)

💤 Files with no reviewable changes (2)

.github/workflows/docker-publish.yml
.github/workflows/golangci-lint.yml

🚧 Files skipped from review as they are similar to previous changes (5)

internal/sms-gateway/models/models.go
pkg/cache/memory.go
.github/workflows/go.yml
internal/sms-gateway/modules/devices/repository.go
internal/sms-gateway/models/migrations/mysql/20250918231606_add_last_seen_index.sql

🧰 Additional context used

🧠 Learnings (3)

📚 Learning: 2025-09-10T11:11:43.462Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/memory.go:59-74
Timestamp: 2025-09-10T11:11:43.462Z
Learning: In pkg/cache/memory.go, the cleanup method removes expired items first, then executes the provided callback function. This means methods like Drain that use cleanup don't need additional expiration filtering since expired items are already removed before the callback runs.

Applied to files:

pkg/cache/memory_profile_test.go
pkg/cache/redis.go
pkg/cache/memory_bench_test.go
pkg/cache/memory_edge_test.go

📚 Learning: 2025-09-10T23:53:42.006Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/redis.go:63-75
Timestamp: 2025-09-10T23:53:42.006Z
Learning: In pkg/cache/redis.go, the Redis cache implementation uses Redis's native field-level TTL via HExpireAt, meaning expired fields are automatically removed by Redis itself. This differs from the memory implementation where the application tracks expiration times. As a result, operations like Drain that call HGetAll will only return non-expired items since expired fields are already removed by Redis.

Applied to files:

pkg/cache/redis.go
pkg/cache/memory_edge_test.go

📚 Learning: 2025-09-10T23:54:16.828Z

Learnt from: capcom6
PR: android-sms-gateway/server#178
File: pkg/cache/redis.go:118-121
Timestamp: 2025-09-10T23:54:16.828Z
Learning: Redis 7.4.0 introduced HEXPIRE, HEXPIREAT, and related commands that support per-field TTL on hash data structures. go-redis v9.6.0+ includes support for these commands with methods like HExpireAt.

Applied to files:

pkg/cache/redis.go

🧬 Code graph analysis (5)

pkg/cache/memory_profile_test.go (1)

pkg/cache/memory.go (1)

NewMemory (16-23)

internal/sms-gateway/cache/factory.go (5)

internal/config/config.go (1)

Cache (74-76)

pkg/cache/cache.go (1)

Cache (5-36)

internal/sms-gateway/online/service.go (1)

New (28-37)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/redis.go (1)

NewRedis (44-56)

pkg/cache/redis.go (3)

pkg/cache/cache.go (1)

Cache (5-36)

pkg/cache/errors.go (2)

ErrKeyNotFound (7-7)

ErrKeyExists (11-11)

pkg/cache/options.go (1)

Option (6-6)

pkg/cache/memory_bench_test.go (2)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/options.go (1)

WithTTL (22-30)

pkg/cache/memory_edge_test.go (3)

pkg/cache/memory.go (1)

NewMemory (16-23)

pkg/cache/options.go (1)

WithTTL (22-30)

pkg/cache/errors.go (2)

ErrKeyExpired (9-9)

ErrKeyNotFound (7-7)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Build / Docker image (linux/amd64)
GitHub Check: Build / Docker image (linux/arm64)
GitHub Check: Benchmark
GitHub Check: Analyze (go)

🔇 Additional comments (21)

pkg/cache/memory_edge_test.go (3)
289-328: Drain behavior assertion is correct (expired items removed).

Good alignment with memory cleanup semantics: expired entries are removed before callback, so Drain returns only valid items and subsequent Get yields ErrKeyNotFound.

232-249: Fix: invalid range over int (won’t compile).

Replace the range loop with a classic indexed for-loop.
-	for i := range numOperations {
+	for i := 0; i < numOperations; i++ {
373-389: Fix: invalid range over int in goroutine launcher (won’t compile).

Use an indexed for-loop.
-	for i := range numGoroutines {
+	for i := 0; i < numGoroutines; i++ {
pkg/cache/memory_profile_test.go (5)
46-55: Measure heap-in-use deltas, not cumulative TotalAlloc.

TotalAlloc is monotonic; use HeapAlloc deltas for per-item growth checks.
-	allocDiff := m2.TotalAlloc - m1.TotalAlloc
-	allocPerItem := float64(allocDiff) / float64(numItems)
+	heapDiff := int64(m2.HeapAlloc) - int64(m1.HeapAlloc)
+	heapPerItem := float64(heapDiff) / float64(numItems)
@@
-	t.Logf("  Total allocated: %d bytes", m2.TotalAlloc)
-	t.Logf("  Allocation difference: %d bytes", allocDiff)
-	t.Logf("  Allocations per item: %.2f bytes", allocPerItem)
+	t.Logf("  Heap alloc: %d bytes", m2.HeapAlloc)
+	t.Logf("  Heap delta: %d bytes", heapDiff)
+	t.Logf("  Heap delta per item: %.2f bytes", heapPerItem)
@@
-	if allocPerItem > 300 {
-		t.Errorf("Expected less than 300 bytes per item, got %.2f bytes", allocPerItem)
+	if heapPerItem > 300 {
+		t.Errorf("Expected less than 300 bytes per item, got %.2f bytes", heapPerItem)
95-109: MemoryCleanup: compare HeapAlloc delta.

Avoid TotalAlloc for post-cleanup growth checks.
-	allocDiff := m2.TotalAlloc - m1.TotalAlloc
+	heapDiff := int64(m2.HeapAlloc) - int64(m1.HeapAlloc)
@@
-	t.Logf("  Total allocated: %d bytes", m2.TotalAlloc)
-	t.Logf("  Allocation difference: %d bytes", allocDiff)
+	t.Logf("  Heap alloc: %d bytes", m2.HeapAlloc)
+	t.Logf("  Heap delta: %d bytes", heapDiff)
@@
-	if allocDiff > 2*1024*1024 { // 2MB
-		t.Errorf("Expected less than 2MB memory growth after cleanup, got %d bytes", allocDiff)
+	if heapDiff > 2*1024*1024 { // 2MB
+		t.Errorf("Expected less than 2MB heap growth after cleanup, got %d bytes", heapDiff)
150-165: MemoryPressure: use HeapAlloc delta and update threshold check.
-	allocDiff := m2.TotalAlloc - m1.TotalAlloc
+	heapDiff := int64(m2.HeapAlloc) - int64(m1.HeapAlloc)
@@
-	t.Logf("  Total allocated: %d bytes", m2.TotalAlloc)
-	t.Logf("  Allocation difference: %d bytes", allocDiff)
+	t.Logf("  Heap alloc: %d bytes", m2.HeapAlloc)
+	t.Logf("  Heap delta: %d bytes", heapDiff)
@@
-	if allocDiff > expectedMaxGrowth {
-		t.Errorf("Expected less than %d bytes memory growth under pressure, got %d bytes", expectedMaxGrowth, allocDiff)
+	if heapDiff > int64(expectedMaxGrowth) {
+		t.Errorf("Expected < %d bytes heap growth under pressure, got %d bytes", expectedMaxGrowth, heapDiff)
243-245: Check Drain error in leak detection test.

Surface failures if Drain ever returns an error.
-		// Clear the cache
-		tempCache.Drain(ctx)
+		// Clear the cache
+		if _, err := tempCache.Drain(ctx); err != nil {
+			t.Errorf("Drain failed: %v", err)
+		}
25-39: Fix: invalid range over integers (won’t compile).

Replace all “range over int” loops with indexed loops; also add an index for the GC loop and the fixed-iteration inner loops.
-	for i := range numItems {
+	for i := 0; i < numItems; i++ {
-	for i := range numItems {
+	for i := 0; i < numItems; i++ {
-	for i := range numItems {
+	for i := 0; i < numItems; i++ {
-	for range numGCs {
+	for i := 0; i < numGCs; i++ {
-		for j := range 100 {
+		for j := 0; j < 100; j++ {
-	for i := range numItems {
+	for i := 0; i < numItems; i++ {
-	for i := range numCaches {
+	for i := 0; i < numCaches; i++ {
-		for j := range 10 {
+		for j := 0; j < 10; j++ {
Also applies to: 75-84, 175-183, 187-199, 201-215, 228-248, 233-241
pkg/cache/memory_bench_test.go (3)
356-376: LGTM: large-value benchmark includes error checks.

Good to see Set/Get errors handled here while measuring large values.

15-28: Benchmarks ignore errors due to //nolint:errcheck; consider minimal checks in hot loops.

To avoid masking regressions, add lightweight error checks (or validate outside timing windows). If keeping //nolint for perf, at least pre-populate and validate keys where reads are expected to succeed.

Also applies to: 30-46, 48-66, 67-81, 82-100, 101-119, 120-139, 140-159, 160-191, 193-224, 226-266, 268-306, 308-338, 340-377, 379-405, 406-429, 430-471

391-401: Fix: non-existent testing.B API and invalid range over int (won’t compile).

Use the standard b.N loop and an indexed loop over size; also handle Drain/Set errors to avoid silent failures.
-			for b.Loop() {
+			for i := 0; i < b.N; i++ {
 				// Clear cache
-				cache.Drain(ctx)
+				if _, err := cache.Drain(ctx); err != nil {
+					b.Fatal(err)
+				}
 
 				// Add new items
-				for j := range size {
+				for j := 0; j < size; j++ {
 					key := "key-" + strconv.Itoa(j)
 					value := "value-" + strconv.Itoa(j)
-					cache.Set(ctx, key, value)
+					if err := cache.Set(ctx, key, value); err != nil {
+						b.Fatal(err)
+					}
 				}
 			}
internal/sms-gateway/modules/devices/service.go (3)

104-124: LGTM: batch SetLastSeen with ctx cancellation and multi-error aggregation

Behavior is correct: empty batch short‑circuits, ctx cancellation is respected, per‑item errors are aggregated and logged.

7-7: Confirmed — Go >= 1.20 present (no action required)

go.mod declares "go 1.24.1" and Dockerfiles use "golang:1.24-alpine", so errors.Join is supported.

136-145: Don't use cache.ErrNotFound — the cache package doesn't export it; change approach

capcom6/go-helpers/cache does not expose a sentinel ErrNotFound, so the proposed errors.Is(err, cache.ErrNotFound) will not compile.

Options:

Preferable: add/export a sentinel (cache.ErrNotFound) in the cache package and have Delete return it for missing keys; then use errors.Is(err, cache.ErrNotFound) and keep the suggested logging diff.

Alternative: if you can't change the dependency, avoid noisy logs by lowering the log level for Delete errors (Debug), making Delete idempotent in a local wrapper (treat not-found as nil), or (last resort, fragile) detect a not-found error by message.

Also extract tokenCacheKey(token string) to DRY the SHA‑256 + hex logic used here and in GetByToken.

Likely an incorrect or invalid review comment.
pkg/cache/redis.go (6)
15-24: Good: Atomic GetAndDelete via Lua.

The script correctly ensures atomicity for get-and-delete.

94-103: Expired vs missing are indistinguishable — interface requires ErrKeyExpired.

Current Redis impl returns ErrKeyNotFound for expired fields. Decide: either change interface/docs or store expiry metadata to return ErrKeyExpired.

Would you like to (A) align the interface to treat expired as not found across backends, or (B) add metadata (e.g., write validUntil alongside value) to support ErrKeyExpired?

108-120: LGTM: Atomic GetAndDelete implementation.

Matches the Lua script behavior and maps not-found correctly.

26-33: Drain deletes concurrent writes — delete only returned fields.

UNLINK/DEL of the whole hash will drop fields written after HGETALL but before deletion. Make the script HDEL only the fields returned.

Apply this diff:
-	hgetallAndDeleteScript = `
-local items = redis.call('HGETALL', KEYS[1])
-if #items > 0 then
-  local ok = pcall(redis.call, 'UNLINK', KEYS[1])
-  if not ok then redis.call('DEL', KEYS[1]) end
-end
-return items
-`
+	hgetallAndDeleteScript = `
+local items = redis.call('HGETALL', KEYS[1])
+if #items > 0 then
+  for i = 1, #items, 2 do
+    redis.call('HDEL', KEYS[1], items[i])
+  end
+end
+return items
+`
130-134: HExpireAt arg order is correct — no change needed

go-redis v9 signature is HExpireAt(ctx context.Context, key string, tm time.Time, fields ...string); the existing p.HExpireAt(ctx, r.key, options.validUntil, key) call is correct.

Likely an incorrect or invalid review comment.

161-164: Bug: HExpireAt missing field argument (won’t compile / wrong TTL target).

You must pass the field name.

Apply this diff:
-	if !options.validUntil.IsZero() {
-		if err := r.client.HExpireAt(ctx, r.key, options.validUntil).Err(); err != nil {
-			return fmt.Errorf("can't set cache item ttl: %w", err)
-		}
-	}
+	if !options.validUntil.IsZero() {
+		if err := r.client.HExpireAt(ctx, r.key, key, options.validUntil).Err(); err != nil {
+			return fmt.Errorf("can't set cache item ttl: %w", err)
+		}
+	}
internal/sms-gateway/cache/factory.go (1)

43-51: Ensure runtime Redis supports HEXPIRE (Redis ≥ 7.4).

go.mod pins github.com/redis/go-redis/v9 v9.9.0 (>= v9.6) so the client supports HExpireAt; the Redis server itself must be ≥ 7.4 to provide HEXPIRE/HEXPIREAT — otherwise Set may partially succeed (value set, per-field TTL fails). Verify production/runtime Redis version or add a fallback (feature-detect HEXPIRE or use per-key TTL) before relying on per-field TTL behavior.

capcom6 · 2025-09-23T11:55:41Z

Deployed...

coderabbitai bot reviewed Sep 10, 2025

View reviewed changes

capcom6 force-pushed the feature/debounce-last-seen-time-updates branch from 8ad2c4d to 8ad3f09 Compare September 11, 2025 04:17

coderabbitai bot reviewed Sep 11, 2025

View reviewed changes

capcom6 force-pushed the feature/debounce-last-seen-time-updates branch from 8ad3f09 to 5abcc8d Compare September 11, 2025 04:39

coderabbitai bot reviewed Sep 11, 2025

View reviewed changes

capcom6 force-pushed the feature/debounce-last-seen-time-updates branch from dd32085 to 1438a30 Compare September 17, 2025 23:53

coderabbitai bot reviewed Sep 18, 2025

View reviewed changes

pkg/cache/memory_edge_test.go Show resolved Hide resolved

pkg/cache/memory_edge_test.go Show resolved Hide resolved

coderabbitai bot reviewed Sep 18, 2025

View reviewed changes

capcom6 force-pushed the feature/debounce-last-seen-time-updates branch from ca5d2d5 to ba4ce53 Compare September 18, 2025 07:55

coderabbitai bot reviewed Sep 18, 2025

View reviewed changes

internal/sms-gateway/online/service.go Show resolved Hide resolved

pkg/cache/memory.go Outdated Show resolved Hide resolved

pkg/cache/redis.go Show resolved Hide resolved

pkg/cache/redis.go Show resolved Hide resolved

coderabbitai bot reviewed Sep 18, 2025

View reviewed changes

internal/sms-gateway/online/service.go Outdated Show resolved Hide resolved

capcom6 force-pushed the feature/debounce-last-seen-time-updates branch from d5f3c2c to e4dce05 Compare September 18, 2025 12:24

coderabbitai bot reviewed Sep 18, 2025

View reviewed changes

pkg/cache/memory_bench_test.go Show resolved Hide resolved

pkg/cache/memory_bench_test.go Show resolved Hide resolved

coderabbitai bot reviewed Sep 18, 2025

View reviewed changes

internal/sms-gateway/models/models.go Show resolved Hide resolved

coderabbitai bot reviewed Sep 19, 2025

View reviewed changes

deployments/prometheus/alerts/online-alerts.yml Outdated Show resolved Hide resolved

coderabbitai bot reviewed Sep 19, 2025

View reviewed changes

coderabbitai bot reviewed Sep 20, 2025

View reviewed changes

internal/sms-gateway/online/service.go Show resolved Hide resolved

coderabbitai bot reviewed Sep 20, 2025

View reviewed changes

capcom6 force-pushed the feature/debounce-last-seen-time-updates branch from f9e0953 to 6f1dc1f Compare September 20, 2025 12:50

coderabbitai bot reviewed Sep 20, 2025

View reviewed changes

capcom6 added 4 commits September 21, 2025 20:35

[cache] add dumb cache module

dda7def

[deps] add Redis

c75e102

[online] add online module

61a9b3d

[config] update example config

f0b28c5

capcom6 force-pushed the feature/debounce-last-seen-time-updates branch from 02145a2 to e149f8e Compare September 21, 2025 13:35

coderabbitai bot reviewed Sep 21, 2025

View reviewed changes

.github/workflows/go.yml Show resolved Hide resolved

internal/sms-gateway/online/service.go Show resolved Hide resolved

pkg/cache/memory_bench_test.go Show resolved Hide resolved

pkg/cache/memory_profile_test.go Show resolved Hide resolved

capcom6 added 4 commits September 23, 2025 17:38

[actions] rethink testing workflow

3e8d8e5

[cache] add app-wide prefix

a8c35ca

[devices] minor fixes to last seen update

e5b15cb

[cache] optimize Redis implementation

2da81b9

capcom6 force-pushed the feature/debounce-last-seen-time-updates branch from 5d76e09 to 2da81b9 Compare September 23, 2025 10:38

coderabbitai bot reviewed Sep 23, 2025

View reviewed changes

capcom6 marked this pull request as ready for review September 24, 2025 00:07

capcom6 merged commit 6fad3c1 into master Sep 24, 2025
11 checks passed

capcom6 deleted the feature/debounce-last-seen-time-updates branch September 24, 2025 06:47

This was referenced Oct 16, 2025

[pubsub] introduce memory and redis implementations #181

Merged

[messages] cache message state #183

Merged

coderabbitai bot mentioned this pull request Oct 29, 2025

[worker] introduce worker mode #187

Merged

coderabbitai bot mentioned this pull request Nov 21, 2025

[jwt] add module and config #191

Draft

[devices] optimize last seen time updates #178

[devices] optimize last seen time updates #178

Uh oh!

Conversation

capcom6 commented Sep 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

capcom6 commented Sep 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 10, 2025 •

edited

Loading