Skip to content

Conversation

@egparedes
Copy link
Contributor

Description

Requirements

  • All fixes and/or new features come with corresponding tests.
  • Important design decisions have been documented in the appropriate ADR inside the docs/development/ADRs/ folder.

If this PR contains code authored by new contributors please make sure:

  • The PR contains an updated version of the AUTHORS.md file adding the names of all the new contributors.

@egparedes egparedes changed the title Include workflow state (if any) in CachedStep hash computations. fix[next]: fix for missing state info in cache computations Jun 26, 2025
@egparedes egparedes changed the title fix[next]: fix for missing state info in cache computations fix[next]: fix for missing state info in the toolchain cache computations Jun 26, 2025
@egparedes egparedes force-pushed the fix-caching-issues-for-backend-switch branch from 5a0a1eb to a1cf05e Compare June 26, 2025 12:11
@egparedes egparedes requested a review from Copilot June 26, 2025 13:24
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances caching by including workflow-internal state in cache key computations, ensuring cache invalidation when relevant settings change.

  • Add a workflow_state_id property to GTFNTranslationStep and other workflow classes.
  • Update CachedStep to use a composite cache_key (input hash + workflow state).
  • Modify tests to use cache_key instead of the old fingerprint function.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
tests/next_tests/.../test_gtfn_module.py Replaced stages.fingerprint_compilable_program with cache_key.
src/gt4py/next/program_processors/codegens/gtfn/gtfn_module.py Imported utils and added workflow_state_id to GTFNTranslationStep.
src/gt4py/next/otf/workflow.py Introduced workflow_state_id in multiple workflow classes and updated imports.
Comments suppressed due to low confidence (3)

src/gt4py/next/program_processors/codegens/gtfn/gtfn_module.py:57

  • Consider adding a docstring for workflow_state_id to clarify which internal settings are included in the hash and how it impacts caching behavior.
    @functools.cached_property

tests/next_tests/unit_tests/program_processor_tests/codegens_tests/gtfn_tests/test_gtfn_module.py:150

  • It would be helpful to add a test verifying that changing the translation step's internal state (e.g., device_type or language_settings) results in a different cache_key, ensuring cache invalidation works correctly.
    cache_key = cached_gtfn_translation_step.cache_key(compilable_program)

src/gt4py/next/otf/workflow.py:174

  • The implementation of workflow_state_id is duplicated across multiple classes (ReplaceEnabledWorkflowMixin, MultiWorkflow, CachedStep, SkippableStep). Consider extracting this into a shared mixin or base class to reduce code duplication.
    @functools.cached_property

@havogt
Copy link
Contributor

havogt commented Jul 3, 2025

before we forget:
I was able to trigger the problem with this icon4py test

pytest -sv --benchmark-disable --datatest-only --backend=gtfn_cpu --level=integration model/driver/tests/driver_tests/test_timeloop.py::test_run_timeloop_single_step -k False-mch_ch_r04b09_dsl-1-2-1-2 -s

by filling the cache with either gtfn_cpu or gtfn_gpu and then switching to the other. With the run for the other backend translation cache created the program for the first backend (can be seen in the hash of the build directory).

@havogt
Copy link
Contributor

havogt commented Jul 7, 2025

Additionally next.config might influence the toolchain...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants