Transform mathematical concepts into beautiful animations using AI-powered generation
This repository contains the output files of a mathematical animation generation process, not the complete pipeline. Users can run these files to render the animations on their machines, but the model and methodology used to generate these animation scripts are not included.
In other words, this repo provides the Manim code that produces the visualizations, but not the AI system that creates this code from mathematical concepts. The complete pipeline from mathematical concept to animation code remains proprietary.
[March 3rd]: I will soon publish an @smolagents that is trained on taking basic prompts and turning them into the prompts the LLM needs. You need about a 2000 token prompt to get fully working manim code out. The agent will make that for you. Rendering will still happen on your machine. The output is the python, depending on the scene, render time could be 5 minutes to 4 hours. There are a wide number of examples already in the repo. The /Doc folder is the Latex output from the model rendered into a PDF. An agent seems like what would help most people so I'll publish that soon.
This project uses DeepSeek AI (and some Google Gemini and now #Grok3) to generate mathematical animations using Manim with better prompts. It includes various examples of complex mathematical concepts visualized through animation. The intent here is to attempt to automatically chart concepts that far exceed most humans' capacity to visualize complex connections across math and physics in a one-shot animation. The future intent is to use RL to fine tune a model on all the working verbose prompts to arrive at 100% one-shot animations from only text descriptions.
- LaTeX Matters: Base prompt engineering technique yielding much better results for displaying formulas on screen.
- Dual-Stream Output: Simultaneous animation code + study notes generation. No model fine tuning necessary. Just pass any working python scene script back as a prompt and ask for "verbose explanations fully rendered as latext study notes.." and you will get working latex that renders into a PDF set at Overleaf.
- Cross-Model Synergy: Leveraging multiple AI models (DeepSeek, Gemini, Grok3) allows for unique perspectives on mathematical visualization, often catching edge cases a single model might miss.
- Educational Impact: The generated animations serve as powerful teaching tools, breaking down complex mathematical concepts into visually digestible sequences.
- Automated Documentation: The system not only generates animations but also produces comprehensive LaTeX documentation, creating a complete learning package.
- Adaptive Complexity: Can handle everything from basic geometric proofs to advanced topics like quantum mechanics and optimal transport theory.
- Interactive Development: The project includes a feedback loop where successful animations can be used to improve prompt engineering for future generations.
- Academic Research: Visualizing complex mathematical proofs and theories
- Education: Creating engaging materials for STEM courses
- Scientific Communication: Bridging the gap between abstract mathematics and visual understanding
- Research Validation: Providing visual verification of mathematical concepts and relationships
The model is not yet a fully fine-tuned version of DeepSeek's R1 Zero, but I am working on that (Still working on this, better prompting still works best). Most errors you will encounter when attempting animations on your own in one shot will be related to how LaTeX is being interpreted as a formula to be rendered on the screen or as part of the code itself.
An interesting new thing to ask for is the capacity to generate simultaneous "study notes" that accompany each animation with a complete explanation of the math and context of the animation. The Benamou animation and notes were the first attempt at this. This also just works straight from the prompt if you pass the scene code directly back to the model.
-
Clone & Setup
git clone https://github.com/HarleyCoops/Math-To-Manim cd Math-To-Manim
-
Environment Setup
# Create and configure .env file with your API key echo "DEEPSEEK_API_KEY=your_key_here" > .env # Install dependencies pip install -r requirements.txt
-
Install FFmpeg
Windows
- Download from https://www.gyan.dev/ffmpeg/builds/
- Add to PATH or use:
choco install ffmpeg
Linux
sudo apt-get install ffmpeg
macOS
brew install ffmpeg
-
Launch Interface
python app.py
Your prompts need extreme detail in order for this to work. For example, this below is a BASIC prompt. You MUST have this level of detail. Most people can't write half of this so the project uses training to try and improve what someone might write as a basic prompt into a what the LLMs are actually looking for. Anyone can do this on your own, I promise this is all prompting but the secret NOT prompting in english - you have to prompt in Latex. Happy hunting!
"Begin by slowly fading in a panoramic star field backdrop to set a cosmic stage. As the camera orients itself to reveal a three-dimensional axis frame, introduce a large title reading 'Quantum Field Theory:
A Journey into the Electromagnetic Interaction,' written in bold, glowing text at the center of the screen. The title shrinks and moves into the upper-left corner, making room for a rotating wireframe representation of 4D Minkowski spacetime—though rendered in 3D for clarity—complete with a light cone that stretches outward. While this wireframe slowly rotates, bring in color-coded equations of the relativistic metric, such as
ds2=−c2dt2+dx2+dy2+dz2ds^2 = -c^2 dt^2 + dx^2 + dy^2 + dz^2, with each component highlighted in a different hue to emphasize the negative time component and positive spatial components.
Next, zoom the camera into the wireframe's origin to introduce the basic concept of a quantum field. Show a ghostly overlay of undulating plane waves in red and blue, symbolizing an electric field and a magnetic field respectively, oscillating perpendicularly in sync. Label these fields as E⃗\\vec{E} and B⃗\\vec{B}, placing them on perpendicular axes with small rotating arrows that illustrate their directions over time. Simultaneously, use a dynamic 3D arrow to demonstrate that the wave propagates along the z-axis.
As the wave advances, display a short excerpt of Maxwell's equations, morphing from their classical form in vector calculus notation to their elegant, relativistic compact form: ∂μFμν=μ0Jν\\partial_\\mu F^{\\mu \\nu} = \\mu_0 J^\\nu. Animate each transformation by dissolving and reassembling the symbols, underscoring the transition from standard form to four-vector notation.
Then, shift the focus to the Lagrangian density for quantum electrodynamics (QED):
LQED=ψ̄(iγμDμ−m)ψ−14FμνFμν.\\mathcal{L}_{\\text{QED}} = \\bar{\\psi}(i \\gamma^\\mu D_\\mu - m)\\psi - \\tfrac{1}{4}F_{\\mu\\nu}F^{\\mu\\nu}.
Project this equation onto a semi-transparent plane hovering in front of the wireframe spacetime, with each symbol color-coded: the Dirac spinor ψ\\psi in orange, the covariant derivative Dμ D_\\mu in green, the gamma matrices γμ\\gamma^\\mu in bright teal, and the field strength tensor Fμν F_{\\mu\\nu} in gold. Let these terms gently pulse to indicate they are dynamic fields in spacetime, not just static quantities.
While the Lagrangian is on screen, illustrate the gauge invariance by showing a quick animation where ψ\\psi acquires a phase factor eiα(x)e^{i \\alpha(x)}, while the gauge field transforms accordingly. Arrows and short textual callouts appear around the equation to explain how gauge invariance enforces charge conservation.
Next, pan the camera over to a large black background to present a simplified Feynman diagram. Show two electron lines approaching from the left and right, exchanging a wavy photon line in the center.
The electron lines are labeled e−e^- in bright blue, and the photon line is labeled γ\\gamma in yellow. Subtitles and small pop-up text boxes narrate how this basic vertex encapsulates the electromagnetic interaction between charged fermions, highlighting that the photon is the force carrier. Then, animate the coupling constant α≈1137\\alpha \\approx \\frac{1}{137} flashing above the diagram, gradually evolving from a numeric approximation to the symbolic form α=e24πε0ℏc\\alpha = \\frac{e^2}{4 \\pi \\epsilon_0 \\hbar c}.
Afterward, transition to a 2D graph that plots the running of the coupling constant α\\alpha with respect to energy scale, using the renormalization group flow. As the graph materializes, a vertical axis labeled 'Coupling Strength' and a horizontal axis labeled 'Energy Scale' come into view, each sporting major tick marks and numerical values. The curve gentl...(truncated from 20157 characters)...nwhile, short textual captions in the corners clarify that this phenomenon arises from virtual particle-antiparticle pairs contributing to vacuum polarization.
In the final sequence, zoom back out to reveal a cohesive collage of all elements: the rotating spacetime grid, the undulating electromagnetic fields, the QED Lagrangian, and the Feynman diagram floating in the foreground. Fade in an overarching summary text reading 'QED: Unifying Light and Matter Through Gauge Theory,' emphasized by a halo effect. The camera then slowly pulls away, letting the cosmic background re-emerge until each component gracefully dissolves, ending on a single star field reminiscent of the opening shot. A concluding subtitle, 'Finis,' appears, marking the animation's closure and prompting reflection on how fundamental quantum field theory is in describing our universe."
-ql
: 480p (development)-qm
: 720p (medium quality)-qh
: 1080p (high quality)-qk
: 4K (ultra high quality)
-p
Preview the animation when done-f
Show the output file in file browser
The rendered animation will be saved in:
media/videos/SceneName/[quality]/SceneName.[format]
- Use
-pql
during development for quick previews - Use
-qh
for final renders - Add
-f
to easily locate output files - Use
--format gif
for easily shareable animations
For example:
# During development (preview QEDJourney scene from QED.py in low quality)
python -m manim -pql QED.py QEDJourney
# Final render (render QEDJourney scene from QED.py in high quality)
python -m manim -qh QED.py QEDJourney
The upcoming smolagents integration will revolutionize how you interact with Math-To-Manim:
-
Prompt Translation: The smolagent will transform simple, natural language descriptions into the detailed, LaTeX-rich prompts required by the LLM to generate high-quality Manim code.
-
Workflow Automation: The agent will handle the entire pipeline from basic prompt → detailed prompt → Manim code generation → rendering configuration.
-
Interactive Refinement: You'll be able to iteratively refine your animations through natural conversation with the agent.
-
Knowledge Augmentation: The agent will automatically enhance your prompts with relevant mathematical context and visualization best practices.
-
Error Handling: When the generated code has issues, the agent will automatically diagnose and fix common problems.
The smolagent will be published as a separate repository that you can easily integrate with this codebase. Stay tuned for the release announcement!
- Examples - Showcase of various mathematical animations
- Architecture - Technical details of the system
- MCP Troubleshooting - Guide for resolving MCP server issues
- Contributing - Guidelines for contributing to the project
DeepSeek R1-Zero represents the culmination of multi-year research at DeepSeek AI into transfer learning, instruction tuning, and long-context neural architectures. Its central objective is to provide a single, all-purpose encoder-decoder model that can handle:
- Complex reading comprehension (up to 8,192 tokens)
- Scenario-based instruction following (e.g., "Given a set of constraints, produce a short plan.")
- Technical and coding tasks (including code generation, transformation, and debugging assistance)
Though R1-Zero is a "descendant" of T5, the modifications to attention, context management, and parameter initialization distinguish it significantly from vanilla T5 implementations.
While standard Transformer models rely on the "Attention is All You Need" paradigm (Vaswani et al., 2017), DeepSeek R1-Zero extends this by:
-
Expanded Context Window
- By employing distributed positional encodings and segment-based attention, R1-Zero tolerates sequences up to 8,192 tokens.
- The extended context window leverages blockwise local attention (in certain layers) to mitigate quadratic scaling in memory usage.
-
Instruction Tuning
- Similar to frameworks like FLAN-T5 or InstructGPT, R1-Zero was exposed to curated prompts (instructions, Q&A, conversation) to improve zero-shot and few-shot performance.
- This approach helps the model produce more stable, context-aware answers and reduces "hallucination" events.
-
Semantic Compression
- The encoder can compress textual segments into "semantic slots," enabling more efficient cross-attention in the decoder stage.
- This is theoretically grounded in Manifold Hypothesis arguments, where the textual input can be seen as lying on a lower-dimensional manifold, thus amenable to a compressed representation.
From a cognitive science perspective, R1-Zero aspires to mimic a layered approach to knowledge assimilation, balancing short-term "working memory" (sequence tokens) with long-term "knowledge representation" (model parameters).
- Parameter Count: ~6.7B
- Encoder-Decoder: Maintains T5's text-to-text approach but with specialized gating and partial reordering in cross-attention blocks.
- Context Window: 8,192 tokens (a 4× expansion over many standard T5 models).
- Layer Stacking: The modifications allow some dynamic scheduling of attention heads, facilitating better throughput in multi-GPU environments.
Aspect | Specification |
---|---|
Architecture Type | Modified T5 (custom config named deepseek_v3 ) |
Heads per Attention | 32 heads (in deeper layers) |
Layer Count | 36 encoder blocks, 36 decoder blocks |
Vocabulary Size | 32k tokens (SentencePiece-based) |
Positional Encoding | Absolute + Learned segment-based for 8k tokens |
Training Paradigm | Instruction-tuned + Additional domain tasks |
Precision | FP32, FP16, 4-bit, 8-bit quantization (via BnB) |
DeepSeek R1-Zero supports multi-bit quantization to optimize memory usage:
-
4-Bit Quantization
- Pros: Minimizes VRAM usage (~8GB).
- Cons: Potentially minor losses in numeric accuracy or generative quality.
-
8-Bit Quantization
- Pros: Still significantly reduces memory (~14GB VRAM).
- Cons: Slight overhead vs. 4-bit but often better fidelity.
-
Full Precision (FP32)
- Pros: The highest theoretical accuracy.
- Cons: ~28GB VRAM usage, not feasible on smaller GPUs.
Sample quantized load (4-bit) with bitsandbytes:
model_4bit = AutoModelForSeq2SeqLM.from_pretrained(
"deepseek-ai/DeepSeek-R1-Zero",
trust_remote_code=True,
device_map="auto",
load_in_4bit=True
)
- Python >= 3.8
- PyTorch >= 2.0
- Transformers >= 4.34.0
- Accelerate >= 0.24.0
- bitsandbytes >= 0.39.0 (if using 4-bit/8-bit)
- FFmpeg (required for video rendering)
FFmpeg is required for Manim to render animations. Here's how to install it:
- Download from https://www.gyan.dev/ffmpeg/builds/
- Recommended: "ffmpeg-release-essentials.7z"
- Extract the archive
- Add the
bin
folder to your system PATH- Or install via package manager:
choco install ffmpeg
- Or install via package manager:
sudo apt-get update
sudo apt-get install ffmpeg
brew install ffmpeg
pip install --upgrade torch transformers accelerate bitsandbytes
If your environment's default PyTorch is older than 2.0, consider updating or installing from PyPI/conda channels that provide a recent version.
[Insert license information here]
[Insert acknowledgments here]
Welcome to the Math-To-Manim documentation! This directory contains comprehensive documentation for the Math-To-Manim project.
- Architecture - Technical details of the system architecture
- Examples - Showcase of various mathematical animations
- MCP Troubleshooting - Guide for resolving MCP server issues
- Smolagents Implementation - Details on the smolagents integration
If you're new to Math-To-Manim, we recommend starting with the main README.md file in the root directory, which provides an overview of the project and installation instructions.
We welcome contributions to the documentation! If you find any errors or have suggestions for improvements, please feel free to submit a pull request or open an issue.
When contributing to documentation, please follow these guidelines:
- Use clear, concise language
- Include examples where appropriate
- Use proper Markdown formatting
- Keep documentation up-to-date with code changes
The documentation is written in Markdown and can be viewed directly on GitHub or in any Markdown viewer.
If you want to generate a more polished documentation site, you can use tools like MkDocs or Sphinx with the markdown extension. Instructions for setting up these tools will be added in the future.