A collection of benchmark experiments for evaluating different aspects of Large Language Models (LLMs).
Evaluates code generation capabilities by prompting to create Python raytracers. Includes visual comparisons and consistency tests across multiple models.
Analyzes the reasoning patterns and chain-of-thought processes of various LLMs by examining statistical patterns in their thinking traces.
Attempt to tests LLM consistency and uniqueness across various topic domains by analyzing generation statistics for prompts with a single word response. Helps identify model-specific response patterns that can serve as "fingerprints" for different LLMs.
A a vibe coding project to test the capabitilies of the mystery model "Optimus Alpha". It implements a real-time path tracing techniques for realistic lighting and shadows.
This project is licensed under CC0 1.0 Universal.