feat(testing): ensure consistent test behavior coverage across all implementations

## Summary

Ensure all santa-lang implementations have the same level of test behavior coverage as Comet (the primary Rust implementation).

## Description

Comet serves as the reference implementation and has comprehensive test coverage for language behavior. Other implementations (Blitzen, Dasher, Donner, Vixen, Prancer) should have equivalent test suites to guarantee consistent behavior across all execution environments.

## Goals

- [ ] Audit Comet's test suite to establish the baseline coverage
- [ ] Identify gaps in other implementations' test coverage
- [ ] Create shared test specifications that all implementations must pass
- [ ] Ensure edge cases and corner cases are tested consistently

## Proposed Approach

### 1. Shared Test Specifications
Create a language-agnostic test specification format (e.g., YAML/JSON) that defines:
- Input santa-lang code
- Expected output/result
- Expected errors (for error handling tests)

```yaml
- name: "fibonacci recursive"
  code: |
    let fib = |n| match n {
      0 => 0,
      1 => 1,
      n => fib(n - 1) + fib(n - 2)
    };
    fib(10)
  expected: 55

- name: "division by zero"
  code: "1 / 0"
  error: "division by zero"
```

### 2. Test Categories
- **Parser tests** - syntax acceptance/rejection
- **Evaluator tests** - expression evaluation
- **Builtin tests** - all builtin functions
- **Pattern matching tests** - destructuring and match expressions
- **Sequence tests** - lazy evaluation, infinite ranges
- **Runner tests** - AoC DSL behavior
- **Error handling tests** - error messages and recovery

### 3. Implementation Matrix
Track which tests pass on which implementation:

| Test Suite | Comet | Blitzen | Dasher | Donner | Vixen | Prancer |
|------------|-------|---------|--------|--------|-------|---------|
| Parser     | ✅    | ?       | ?      | ?      | ?     | ?       |
| Builtins   | ✅    | ?       | ?      | ?      | ?     | ?       |
| Sequences  | ✅    | ?       | ?      | ?      | ?     | ?       |
| ...        | ...   | ...     | ...    | ...    | ...   | ...     |

## Tasks

- [ ] Export/document Comet's existing test cases
- [ ] Define shared test specification format
- [ ] Create test runner that can execute specs against any implementation
- [ ] Add CI job to run shared tests against all implementations
- [ ] Document any intentional behavioral differences between implementations

## Notes

- Vixen implements a subset of santa-lang, so some tests may be marked as "not applicable"
- Performance-related tests may have different thresholds per implementation
- Focus on behavioral correctness, not implementation details

## Related

- Issue #1 (missing builtins) - related to consistency across implementations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(testing): ensure consistent test behavior coverage across all implementations #14

Summary

Description

Goals

Proposed Approach

1. Shared Test Specifications

2. Test Categories

3. Implementation Matrix

Tasks

Notes

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Test Suite	Comet	Blitzen	Dasher	Donner	Vixen	Prancer
Parser	✅	?	?	?	?	?
Builtins	✅	?	?	?	?	?
Sequences	✅	?	?	?	?	?
...	...	...	...	...	...	...

feat(testing): ensure consistent test behavior coverage across all implementations #14

Description

Summary

Description

Goals

Proposed Approach

1. Shared Test Specifications

2. Test Categories

3. Implementation Matrix

Tasks

Notes

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions