A large-scale dataset of non-humanoid (animal) head identities, each rendered across 10 consistent facial expressions.
This is the official image dataset accompanying [ECCV 2026] RegHead: Non-Humanoid Head Blendshapes via Feed-Forward Registration. It provides paired multi-expression imagery for a large, diverse population of non-humanoid (animal) head identities β the kind of consistent, expression-aligned supervision needed to learn and evaluate head blendshapes beyond the human face.
Each identity is one distinct animal head (e.g. a photorealistic Doberman with round eyes wearing a pirate hat) rendered in the same 10 facial expressions. Because every identity shares the same expression vocabulary, the dataset gives you aligned cross-expression pairs for a fixed identity β directly usable for expression editing, blendshape learning, identity- preserving animation, and registration.
| Identities | 33,724 |
| Expressions per identity | 10 (complete for every identity) |
| Total images | 337,240 |
| Per-sample packaging | 1 identity = 1 WebDataset sample (10 images + 1 JSON) |
| Available resolutions | 512Γ512 and 1024Γ1024 |
| Image format | JPEG (quality 95) |
| Container | WebDataset .tar shards |
| License | Snap Inc. Non-Commercial License (research only) |
Every identity contains exactly these ten expression images:
| Field | Meaning |
|---|---|
open_m_open_e |
open mouth, open eyes (neutral reference) |
halfo_m_o_e |
half-open mouth, open eyes |
close_m_o_e |
closed mouth, open eyes |
close_m_halfo_e |
closed mouth, 50%-open eyes |
close_m_0_25o_e |
closed mouth, 25%-open eyes |
close_m_0_75o_e |
closed mouth, 75%-open eyes |
close_m_close_e |
closed mouth, closed eyes |
close_m_smile |
closed-mouth smile |
raise_eyebrows |
raised eyebrows |
frown |
frown |
Identities are sampled as combinations of four attributes β an animal, a render style, a facial feature, and 1β2 accessories β yielding broad visual diversity across the population.
| Attribute | Pool size | Examples |
|---|---|---|
| Animal | ~270 | Doberman, Polar Bear, Cormorant, Mole, Antelope, Owl, Reindeer, Giant Anteater, β¦ |
| Render style | 4 | Photorealistic Β· Plastic Toy Render Β· Computer Animated Film-Style 3D Β· Video Game NPC Style |
| Feature | ~38 | round eyes, fluffy cheeks, bushy eyebrows, arched eyebrows, curved horns, β¦ |
| Accessories | ~57 | pirate hat, bunny-ear headband, party hat, space helmet, laurel crown, β¦ (1 or 2 per identity) |
Render-style distribution (β uniform across the four styles):
| Style | Identities | Share |
|---|---|---|
| Video Game NPC Style | 8,600 | 25.5% |
| Plastic Toy Render | 8,427 | 25.0% |
| Photorealistic | 8,426 | 25.0% |
| Computer Animated Film-Style 3D | 8,271 | 24.5% |
Expression completeness: 100% β all 33,724 identities have all 10 expressions (337,240 / 337,240 image fields present). Identities with missing or corrupt frames were excluded prior to release (11 removed from an inspected set of 33,735).
Provenance split: ~59% (19,935) of identities are newly generated for this release via the pipeline below; ~41% (13,789) come from an earlier consistent generation batch using the same expression vocabulary and models.
The dataset is produced by a three-stage generative pipeline. A base head is generated, neutralized into a clean reference, then edited into each target expression with a chain of expression-specific image-editing models.
βββββββββββββββββββββββ ββββββββββββββββββββββββ ββββββββββββββββββββββββββββββ
prompt βββΆ β Stage 0: textβimage ββββΆβ Stage 1: neutralize ββββΆβ Stage 2: expression edits β
β OpenAI gpt-image-1 β β OpenAI (β open_m_ β β Qwen-Image + LoRA adapters β
β (sample identity) β β open_e reference) β β (chained, 10 expressions) β
βββββββββββββββββββββββ ββββββββββββββββββββββββ ββββββββββββββββββββββββββββββ
-
Stage 0 β Text-to-image (identity creation). A prompt is assembled from a seeded random combination of animal Γ style Γ feature Γ accessories and rendered with OpenAI's image model to create the base identity.
-
Stage 1 β Neutralization. The base image is converted into a clean, floating-head neutral reference on a white background: the
open_m_open_e(open-mouth / open-eyes) frame. -
Stage 2 β Expression editing. Expression-specific Qwen-Image LoRA adapters edit the reference into the remaining expressions. Edits are chained for consistency:
open_m_open_eβ (close-mouth adapter) βhalfo_m_o_e,close_m_o_eclose_m_o_eβ (close-eye adapter) βclose_m_close_e,close_m_halfo_e,close_m_0_25o_e,close_m_0_75o_eclose_m_o_eβ (frown/smile/eyebrow adapter) βclose_m_smile,raise_eyebrows,frown
The closed-mouth / open-eyes frame (
close_m_o_e) is used as the editing base for the seven downstream expressions so that the mouth stays correctly closed throughout.
Generation is seeded for reproducibility, so the identity set is deterministic and a smaller sample is a strict prefix of a larger one.
Image origin & third-party tools. All images are model-generated (no captured/real photos). Stages 0β1 use the OpenAI GPT image model; Stage 2 uses the Qwen-Image editing model (Apache-2.0) with custom LoRA adapters. See License for terms.
The dataset ships in two resolution variants with identical content, identity sets, expressions, metadata, and per-sample layout β they differ only in image resolution and total size. Pick the one that fits your compute and resolution needs.
| 512 release | 1024 release | |
|---|---|---|
| Release tag | reghead-webdataset-v1-512 |
reghead-webdataset-v1-1024 |
| Resolution | 512 Γ 512 | 1024 Γ 1024 |
| Image format | JPEG q95 | JPEG q95 |
| Identities | 33,724 | 33,724 |
| Images | 337,240 | 337,240 |
Shards (.tar) |
11 | 34 |
| Total size | 18.96 GiB (19,412 MiB) | 58.79 GiB (60,200 MiB) |
| Max shard size | < 1900 MiB (GitHub-Release safe) | < 1900 MiB (GitHub-Release safe) |
| Best for | fast prototyping, lightweight training, limited bandwidth/disk | high-resolution training, detail-sensitive editing & evaluation |
Both variants encode the same source frames; the 512 release is a downscaled copy. Train/develop on 512, then scale to 1024 without changing any data-loading code β only the shard URLs change.
Each release consists of:
reghead-v1-<res>-000000.tar # WebDataset image shards (the bulk of the data)
reghead-v1-<res>-000001.tar
...
metadata.tar.gz # release_manifest.json + webdataset_samples.jsonl
checksums.tar.gz # SHA256SUMS for every shard
Inside each shard, every identity is one sample β a group of files sharing a common key:
00000000_id000_Doberman_Dog_..._64896c31.open_m_open_e.jpg
00000000_id000_Doberman_Dog_..._64896c31.close_m_o_e.jpg
... # 10 .jpg files, one per expression
00000000_id000_Doberman_Dog_..._64896c31.json # per-sample metadata
Per-sample JSON schema:
{
"identity": "id000_Doberman_Dog_Photorealistic_round_eyes_a_wreath_a_pirate_hat",
"identity_dir": "id000_Doberman_Dog_Photorealistic_round_eyes_a_wreath_a_pirate_hat",
"sample_key": "00000000_id000_Doberman_Dog_..._64896c31",
"num_expressions": 10,
"expressions": [
{"expression": "close_m_0_25o_e", "field": "close_m_0_25o_e.jpg", "original_file": "close_m_0_25o_e.jpg"},
...
]
}metadata/webdataset_samples.jsonl provides a flat index (one row per identity:
sample_key β identity β shard β num_expressions), and metadata/release_manifest.json
lists every shard with its size and SHA-256.
# 512 release (smaller, ~19 GiB)
gh release download reghead-webdataset-v1-512 \
--repo snap-research/RegHead --dir reghead_download
# or the 1024 release (~59 GiB)
gh release download reghead-webdataset-v1-1024 \
--repo snap-research/RegHead --dir reghead_downloadcd reghead_download
tar -xzf checksums.tar.gz # extracts checksums/SHA256SUMS
mkdir -p data && mv reghead-v1-*.tar data/ # SHA256SUMS paths are data/-relative
sha256sum -c checksums/SHA256SUMStar -xzf metadata.tar.gz # β metadata/{release_manifest.json,webdataset_samples.jsonl}pip install webdataset pillowimport glob, json
import webdataset as wds
urls = sorted(glob.glob("reghead_download/data/*.tar"))
EXPRESSIONS = [
"open_m_open_e", "halfo_m_o_e", "close_m_o_e", "close_m_halfo_e",
"close_m_0_25o_e", "close_m_0_75o_e", "close_m_close_e",
"close_m_smile", "raise_eyebrows", "frown",
]
def parse_identity_sample(sample):
meta = sample["json"]
if isinstance(meta, (bytes, bytearray)):
meta = json.loads(meta.decode("utf-8"))
images = {expr: sample[f"{expr}.jpg"] for expr in EXPRESSIONS
if f"{expr}.jpg" in sample}
return {"identity": meta["identity"], "images": images, "metadata": meta}
dataset = wds.WebDataset(urls).decode("pil").map(parse_identity_sample)
for item in dataset:
print(item["identity"], list(item["images"].keys()))
# item["images"]["frown"] is a PIL.Image
breakNote: image fields use the
.jpgextension. Decode with.decode("pil")to getPIL.Imageobjects (or.decode("torchrgb")for tensors).
Per the license, this release contains only the 2D animal-head images and their accompanying metadata. It does not include any 3D models, meshes, textures, blendshape rigs, model weights, checkpoints, the registration model, or training code.
This dataset is released under the Snap Inc. Non-Commercial License β for non-commercial,
research purposes only. See the LICENSE file for the full text.
Third-party tools used to generate the imagery are governed by their own terms:
- OpenAI GPT image model β used to generate neutral-expression animal-head images.
- Qwen-Image editing model β used to generate the expression edits; Qwen-Image is licensed under the Apache License 2.0.
No third-party software, model weights, checkpoints, or code is licensed under this license.
If you use this dataset, please cite RegHead:
@inproceedings{reghead2026,
title = {RegHead: Non-Humanoid Head Blendshapes via Feed-Forward Registration},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2026}
}