Skip to content

Template caches ignore schemaVersion, breaking documented behavior for dynamic data models #10383

@michaelquery

Description

@michaelquery

Problem

The documentation for schema_version states:

schema_version can be used to tell Cube that the data model should be recompiled in case it depends on dynamic definitions fetched from some external database or API.

If the returned string is different, the data model will be recompiled.

However, when schemaVersion changes, the template caches (compiledYamlCache, compiledJinjaCache, compiledScriptCache) continue to serve stale output because they use only MD5(file.content) as the cache key—they don't include the schema version.

This means Jinja templates that "depend on dynamic definitions fetched from some external database or API" (the exact use case schema_version is documented for) return cached output from a previous version even after schemaVersion changes and recompilation is triggered.

Minimal Reproduction

Setup: Template with timestamp + schema_version that changes every request

# model/cubes/test_recompile.yml.jinja
cubes:
  - name: test_recompile
    sql: "SELECT 1 as id, '{{ get_compile_timestamp() }}' as compiled_at"
    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true
      - name: compiled_at
        sql: compiled_at
        type: string
# cube.py
from cube import config
from datetime import datetime
import random

@config("schema_version")
def schema_version(ctx: dict) -> str:
    # Random value = every request triggers recompilation
    return str(random.random())

def get_compile_timestamp() -> str:
    # Returns current time when template is rendered
    return datetime.now().isoformat()

Expected behavior

Every request returns a different schema_version, so every request should recompile the schema. The compiled_at dimension should show a new timestamp on every request.

Actual behavior

The compiled_at timestamp never changes after the first request. Despite schema_version changing and logs showing "Recompiling schema", the template function get_compile_timestamp() is never re-executed.

What happens internally

  1. Request arrives, schema_version() returns "0.123" (random)
  2. CompilerApi.getCompilers() detects version differs from last request
  3. Logs "Recompiling schema" and calls DataSchemaCompiler.transpileJinjaFile()
  4. Cache key = MD5(file.content)cache hit (file unchanged)
  5. Returns cached template output from first-ever compilation
  6. get_compile_timestamp() is never called - timestamp is stale forever

Production Use Case

In real deployments, the template function would fetch dynamic configuration from a database or API:

# Production example: template function fetches tenant-specific segments
def load_tenant_segments(tenant_id: int) -> list:
    """Called during template rendering to generate dynamic dimensions."""
    response = requests.get(f"http://app/api/tenants/{tenant_id}/segments")
    return response.json()  # [{"name": "vip_users", "sql": "..."}, ...]
# Production example: dynamically generate dimensions from API data
{% for segment in load_tenant_segments(COMPILE_CONTEXT.securityContext.tenant_id) %}
  - name: {{ segment.name }}
    sql: "{{ segment.sql }}"
    type: string
{% endfor %}

When segment definitions change in the database, schema_version returns a new value to trigger recompilation—but the template cache prevents load_tenant_segments() from being called with fresh data.

Root Cause

In packages/cubejs-schema-compiler/src/compiler/DataSchemaCompiler.ts, all three template caches use content-only keys:

// Line 727 (YAML), 776 (Jinja), 854 (JS) - all identical pattern
const cacheKey = crypto.createHash('md5').update(JSON.stringify(file.content)).digest('hex');

The schemaVersion is not included in the cache key, so templates are served from cache even when the version changes and recompilation is triggered.

Current Workaround (problematic)

The only workaround is to include schema-affecting config in app_id:

@config("context_to_app_id")
def context_to_app_id(ctx: dict) -> str:
    tenant_id = ctx["securityContext"]["tenant_id"]
    tenant_config = get_tenant_config(tenant_id)
    return f"APP_{tenant_id}_{tenant_config['segments_version']}"

This causes memory bloat and resource duplication because each app_id creates a new CompilerApi with its own resources. The multitenancy docs explicitly warn about this overhead.

Environment

  • Cube.js version: latest (2025-02)
  • Using: Python SDK with Jinja templates
  • Deployment: Multi-tenant SaaS with per-tenant dynamic configuration

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions