Merged
Conversation
…ception context for invalid tokens
…ml and uv.lock - Added pre-commit version 4.5.1 to the development dependencies. - Included new packages cfgv version 3.5.0, distlib version 0.4.0, identify version 2.6.16, and nodeenv version 1.10.0 in the lock file. - Updated filelock version to 3.24.2 in the lock file.
…on in GitHub Actions workflow
…ation for consistency
…nctions from `shared.py` to clean up the codebase.
chore: adopt Ruff and standardize linting workflow across the SDK
- Included `pytest-cov` in development dependencies for coverage reporting. - Configured coverage settings in `pyproject.toml` to specify source and report options. - Updated GitHub Actions workflow to run tests with coverage. - Added new tests for deterministic behavior and sequence validation in `fake_token_ids` and `fake_prompt_token_ids` functions. - Enhanced `RolloutCompletionTracker` tests to verify clearing and recording functionality.
- Introduced `SimpleAgentLoop`, `FailingAgentLoop`, and `SlowAgentLoop` classes for testing purposes in `conftest.py`. - Added `make_rollout_payload` and `mock_llm_client` utility functions to streamline test setup. - Updated various test files to utilize the new agent loop stubs and utility functions, improving test coverage and maintainability. - Refactored existing tests to use parameterized inputs for better clarity and efficiency.
- Deleted `test_litellm_provider.py` as its contents have been integrated into `test_rubric_eval.py` and `test_utils.py`. - Added comprehensive tests for prompt building, JSON parsing, and LiteLLM evaluation in `test_rubric_eval.py`. - Introduced new tests for utility functions in `test_utils.py`, covering decorator validation and type checking. - Enhanced existing tests for rollout context and external LLM client functionality, ensuring robust coverage and maintainability.
- Added custom markers for integration and slow tests in `pytest.ini`. - Refactored test fixtures in `conftest.py` to load sample data from JSON files, improving maintainability and clarity. - Removed unused mock agent loop classes to streamline the test setup. - Updated tests to utilize the new fixture structure, ensuring consistency and better organization.
…r testing - Added a strategy matrix to the `pytest` job in `tests.yml` to run tests on Python 3.10 and 3.12. - Modified the Python setup step to dynamically use the specified Python version from the matrix.
…ol in tests - Changed the test command in CONTRIBUTING.md to point to the correct test file. - Improved the concurrency control test in test_app.py by implementing a ConcurrencyTrackingAgent to accurately track peak concurrent executions. - Refactored the test_load_agent_loop_does_not_duplicate_cwd_in_sys_path to restore original sys.path after the test.
Refactor and expand unit tests with coverage enforcement and multi-version CI
…y ruff hook - Added new hooks for trailing whitespace, end-of-file fixing, YAML and TOML checks, merge conflict detection, and private key detection. - Updated the ruff hook to use 'ruff-check' instead of 'ruff' with the '--fix' argument.
- Replaced ruff installation with the astral-sh/ruff-action for linting. - Updated the test job to include Python versions 3.10, 3.11, 3.12, and 3.13. - Modified the test execution steps to use uv for running tests and building the package. - Added a new build job to check the package using uvx and twine.
Introduce dedicated Pyright and Mypy checks in CI and align type annotations/configuration across rollout and evaluation modules. This catches type regressions earlier and improves typing reliability.
- Renamed "Setup" to "Quick Start" for clarity. - Consolidated installation instructions for both `uv` and `pip`. - Enhanced the commands reference section for better usability. - Clarified type checking requirements for Pyright and mypy. - Updated pre-commit hook instructions for improved clarity.
- Added type stubs for `PyYAML` and `requests` in `pyproject.toml` and `uv.lock` to improve type checking in development. - Updated various files to include type hints for better clarity and type safety, including changes in `cli.py`, `platform_client.py`, and several other modules. - Refactored return types and variable annotations to ensure consistency across the codebase.
[BREAKING][misc] feat: reorganize Local/Remote rollout docs and switch dataset support to parquet/jsonl/csv
…ore and GitHub Actions workflow for new build process using 'uv'.
[ci] chore: migrate PyPI publish build to uv and remove legacy packaging artifacts
…improving examples. Updated README to better outline Local and Remote Rollout modes, ensuring consistency in explanations and error handling.
…t and integrating Codecov for coverage uploads on Python 3.12.
…decov only for non-fork pull requests on Python 3.12.
[ci] chore: automate PR labeling and integrate Codecov coverage reporting
Added Codecov badge to README for coverage tracking.
[doc] chore: Add Codecov badge to README
…rubric_eval and llm_client modules. Suppress Python warnings in CLI for a cleaner user experience.
…improved environment variable resolution in user project directories.
[cli] fix: load .env from CWD and centralize warning suppression
[misc] chore: increment version
There was a problem hiding this comment.
1 issue found across 170 files
Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed.
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="examples/reward_functions.py">
<violation number="1" location="examples/reward_functions.py:15">
P2: Reward function examples still omit the required **kwargs parameter, which the docs say is needed for platform compatibility. These examples will raise TypeError if extra keyword arguments are passed. Update the signatures to include **kwargs.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
[ci] chore: add permissions to workflows for content access
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Why
Type of Change
How to Test
Checklist
[module] type: descriptionformatenhancement,bug,breaking)ruff check .andruff format --check .passpyright osmosis_ai/passespytestpasses (new tests added if applicable)Summary by cubic
Release v0.2.16 with a docs overhaul, expanded dataset support (Parquet/JSONL/CSV), cleaner CLI behavior, and improved CI with automated release workflows and explicit permissions.
Documentation and CLI
CI/Release and Tooling
Written for commit b1cfa8c. Summary will update on new commits.