Skip to content

zijianchen98/BioMotion_Arena

Repository files navigation

Can Large Models Fool the Eye? A New Turing Test for Biological Animation 👀

Even young infants can easily interpret the biological motions through pointlight display without any knowledge foundation

1Shanghai Jiao Tong University
2Shanghai AI Laboratory
3Macao Polytechnic University
*Corresponding author


We propose BioMotion Arena, the first biological motion-based visual preference evaluation framework for large models. We focus on ten typical human motions and introduce fine-grained control over gender, weight, mood, and direction. More than 45k votes for 53 mainstream LLMs and MLLMs on 90 biological motion variants are collected.

Release 🚀

  • [2025/08/22] 🔥 We add the Gradio version of BioMotion Arena.
  • [2025/08/11] 🔥 BioMotion Arena was highlighted in Medium authored by Berend Watchus !
  • [2025/08/08] ⚡️ Project Website for BioMotion Arena is online !

Motivations 💡

Evaluating the abilities of large models and manifesting their gaps are challenging. Current benchmarks adopt either ground-truth-based score-form evaluation on static datasets or indistinct textual chatbot-style human preferences collection, which may not provide users with immediate, intuitive, and perceptible feedback on performance differences. In this paper, we introduce BioMotion Arena, a novel framework for evaluating large language models (LLMs) and multimodal large language models (MLLMs) via visual animation.

Motion Space 🧩

We include 10 typical human actions as well as four fine-grained attributes:

  • Action: Walking, running, waving a hand, jumping up, jumping forward, bowing, lying down, sitting down, turning around, and forward rolling
  • Gender: Man, woman
  • Happiness: Happy, sad
  • Weight: Heavy, light
  • Direction: Left, right, facing forward

Participating LLMs and MLLMs 🤖

Our BioMotion Arena currently includes 53 large models (both LLMs and MLLMs) in total, with a mix of cutting-edge proprietary models, open-source models, and code-specific models.

Run with Gradio 🎮

We use a third-party unofficial API. Otherwise, you need to update the API structure (_def call_model_api_) part of the code.

python biomotion_gradio.py --default-key xxxxxxxxxxx --special-key xxxxxxxxxx

Running on local URL: http://127.0.0.1:7860

  1. You can use the default recommended prompt, or you can input your own desired action prompt word.
  2. Click the Generate Code button, and waiting for the responses from two anonymous models.
  3. Click the Run Code A and Run Code B buttons respectively.
  4. Then, make a preference selection based on the motion animation result of the code execution. The result will be automatically saved locally at preferences.csv.

Code 💻

We recommend directly installing the environment for the model to be evaluated.

  1. Such as Qwen2.5-VL, Qwen2.5, llama3.3-70B, InternVL2.5, and OpenAI.

  2. Two code examples for both proprietary (OpenAI's) and open-source (Qwen) LLMs are given. openai-MLLM.py provides a code demo for MLLMs with reference image input.

python openai.py
python qwen.py
python openai-MLLM.py

Human Preference Collection

Configure the evaluation pool and output path in anmoy-subjective-exp.py, then launch the UI code for anonymous subjective experiments.

cd subjective-exp-tool
python anmoy-subjective-exp.py

Calculate the Elo score from the collected human preference:

python elo_score.py

Main Results 📌

The average lines of code for biological motion representation (click to expand)
Win-rate and the rate of ‘Both-are-bad’ (click to expand)
Elo scores of a subset of model (click to expand)
Comparison with Other Benchmarks (click to expand)

Contact ✉️

Please contact the first author of this paper for queries.

  • Zijian Chen, zijian.chen@sjtu.edu.cn

Citation 📎

If you find our work interesting, please feel free to cite our paper:

@article{chen2025can,
  title={Can Large Models Fool the Eye? A New Turing Test for Biological Animation},
  author={Chen, Zijian and Deng, Lirong and Chen, Zhengyu and Zhang, Kaiwei and Jia, Qi and Tian, Yuan and Zhu, Yucheng and Zhai, Guangtao},
  journal={arXiv preprint arXiv:2508.06072},
  year={2025}
}

Releases

No releases published

Packages

No packages published

Languages