Geometric Red‑Teaming for Robotic Manipulation

Authors: Divyam Goel, Yufei Wang, Tiancheng Wu, Guixiu Qiao, Pavel Piliptchak, David Held†, Zackory Erickson†
Venue: Accepted as an Oral at CoRL 2025 (selection rate 5.7%).
Abstract:

Standard evaluation protocols in robotic manipulation typically assess policy performance over curated, in‑distribution test sets, offering limited insight into how systems fail under plausible variation. We introduce a red‑teaming framework that probes robustness through object‑centric geometric perturbations, automatically generating CrashShapes—structurally valid, user‑constrained mesh deformations that trigger catastrophic failures in pre‑trained manipulation policies. The method integrates a Jacobian field–based deformation model with a gradient‑free, simulator‑in‑the‑loop optimization strategy. Across insertion, articulation, and grasping tasks, our approach consistently discovers deformations that collapse policy performance, revealing brittle failure modes missed by static benchmarks. By combining task‑level policy rollouts with constraint‑aware shape exploration, we aim to build a general purpose framework for structured, object‑centric robustness evaluation in robotic manipulation. We additionally show that fine‑tuning on individual CrashShapes, a process we refer to as blue‑teaming, improves task success by up to 60 percentage points on those shapes, while preserving performance on the original object, demonstrating the utility of red‑teamed geometries for targeted policy refinement. Finally, we validate both red‑teaming and blue‑teaming results with a real robotic arm, observing that simulated CrashShapes reduce task success from 90% to as low as 22.5%, and that blue‑teaming recovers performance to up to 90% on the corresponding real‑world geometry—closely matching simulation outcomes.

Build from Source

This repository does not ship a ready‑made environment.yml or requirements.txt. Instead, the project relies on the environments defined by its supporting submodules. The parent conda environment for all red‑teaming experiments should be the isaacgym environment created when installing the simulation back‑end. You will also need an apap environment for mesh deformation. Consult the submodule READMEs for their respective dependencies.

Submodules

Two submodules contain critical components and must be set up before running any experiments:

src/apap – provides the As‑Plausible‑As‑Possible (APAP) deformation code used to generate CrashShapes. Follow the README inside the submodule for setup and tests. Create its own environment (often named apap) and run the basic tests described there.
src/isaacgym – wraps the IsaacGym simulation environments. Follow the README inside the submodule to set up the task environments and train the baseline policies before red‑teaming. The trained policies (checkpoints) produced here will be referenced by the red‑teaming pipeline.

Quick Intro

The optimisation code in this repository implements a population‑based optimisation loop to discover CrashShapes—confounding mesh geometries that cause a trained manipulation policy to fail. Each red‑teaming run samples deformations of a nominal object (using the APAP submodule), assembles the object in the simulator, evaluates the policy’s success on the deformed shape, and updates the distribution of deformations over multiple iterations.

High‑level flow:

Nominal asset: A nominal object mesh (e.g. io.source_mesh_file in the per-task YAML, such as conf/cem_drawer.yaml) and associated keypoint files (io.anchor_points_file, io.handle_points_file, io.target_mesh, io.alignment_reference) are prepared in apap_output_dir.
Sampling and deformation: Each iteration, the CEM sampler (standard, clustering, TOPDM, etc.) proposes handle displacements. These displacements are passed to APAP via the ./apap submodule (see DrawerOptimizer._run_apap_deformation for an example). APAP produces deformed meshes in ${io.apap_output_dir}/outputs and writes them to output_dir as .obj files.
Simulation: Deformed meshes are converted to simulator assets (via COACD and URDF generation) and evaluated using the IndustReal/IsaacGym environment (see src/core/evaluation/isaac_* classes). The policy and task are specified under sim.policy in the tasl specific YAML file; the checkpoint should point to a trained policy from the isaacgym submodule.
Logging: Each trial’s result is logged to a Parquet dataset under io.parquet_dataset. The dataset captures the displacement vectors, success rate, and any per‑environment metrics. You can later analyse the data using the provided scripts.

Outputs from a run include:

Deformed mesh files in ${output_dir} and ${io.apap_out_root}.
Parquet logs in ${io.parquet_dataset} documenting per‑iteration successes and failures.
Optional simulator logs in ${output_dir}/sim_logs (cleaned up automatically after runs).

Asset Preparation and VLM‑based Keypoint Extraction

Before running red‑teaming, you must prepare assets by extracting anchor and handle points for each object. We provide a two‑stage Visual–Language Model (VLM) inference pipeline based on ChatGPT‑4o to identify meaningful keypoints from both the point cloud and rendered images of each mesh. This pipeline is implemented in src/scripts/prepare_dataset.py.

To run the preparation pipeline, supply an input directory of OBJ meshes, a metadata file describing which objects to process, an output directory and an API key for ChatGPT. The script writes a subdirectory for each processed object containing:

mesh.obj – the nominal mesh used by APAP.
alignment_ref.obj and target.obj – centred reference meshes used for evaluation.
handle_points.txt and/or anchor_points.txt – lists of keypoint indices and coordinates returned by the VLM. The presence of these files is governed by the top_down_anchors field in the metadata.

Run the pipeline as follows:

conda activate apap
export API_KEY=<your_chatgpt_api_key>
python -m src.scripts.prepare_dataset \
  --input_dir <preparation_dir> \
  --output_dir <output_dir> \
  --api_key $API_KEY \
  --metadata metadata.json

The preparation_dir should contain the OBJ meshes to be processed. The output_dir will be populated with one subdirectory per mesh, each holding the APAP input files mentioned above. After preparation, set apap_output_dir in your task specific YAML file to point to this output_dir so that the optimiser reads the correct anchor and handle files.

The metadata.json file is a list of objects with flags controlling the VLM prompts. For example:

[
  {
    "object_name": "obj1",
    "red_team_asset": true,
    "top_down_anchors": true
  },
  {
    "object_name": "obj2",
    "red_team_asset": true,
    "top_down_anchors": false
  }
]

The red_team_asset flag selects which objects to process. When top_down_anchors is true, the anchor points written to anchor_points.txt are such that the base of the object remains stable. For tasks other than grasping (e.g. insertion), you may need to adjust the prompts used by the VLM. Edit the "task" field in src/utils/vlm_utils.py accordingly.

Red-Teaming With Different Task & Policy Suites

The repository defines several entry points under src/, each paired with its own Hydra configuration file under conf/ (e.g. cem_drawer.yaml, cem_manipulation.yaml, cem_grasping.yaml). Before running any of these commands, train or load the corresponding policy using the src/isaacgym submodule, and update the task-specific YAML with the correct checkpoint path. Some tasks may require additional setup, these instructions have been collated into task specific documentation provided in src/isaacgym/docs.

Red‑teaming state‑based drawer opening policy

conda activate isaacgym
python -m src.cem_drawer

This launches the TOPDM optimiser for the drawer opening environment. It reads the Hydra configuration from conf/cem_drawer.yaml and writes deformed meshes to ${output_dir} (derived from io.apap_output_dir). Parquet logs are stored in ${io.parquet_dataset}. Before running, set sim.policy.task: FrankaCabinet and sim.policy.checkpoint to the state‑based drawer policy trained in src/isaacgym. The objectives for this task should reflect Opening Distance; set sim.objectives: ["Opening Distance"] and adjust weights and directions as appropriate. Follow the drawer task instructions in the isaacgym submodule to obtain the policy checkpoint.

Red‑teaming state‑based insertion policy

conda activate isaacgym
python -m src.cem_manipulation

This runs the red‑teaming pipeline for the peg insertion task using a state‑based policy. The default conf/cem_manipulation.yaml sets sim.policy.task: IndustRealTaskPegsInsert, sim.objectives: ["Insertion Success"] and expects a checkpoint. Update sim.policy.checkpoint to the correct insertion policy file from src/isaacgym before running. Outputs are logged to the same directories as above.

Red‑teaming PCD‑based insertion policy

conda activate isaacgym
python -m src.cem_manipulation +sim.policy.pcd_policy=True

This command enables the point‑cloud observation mode for the insertion policy by setting sim.policy.pcd_policy=True via Hydra. Ensure that the insertion policy in src/isaacgym has been trained in point‑cloud mode and update sim.policy.checkpoint accordingly. Other configuration keys remain the same as the state‑based insertion run.

Red‑teaming grasping policy

conda activate isaacgym
python -m src.cem_grasping

This entry point red‑teams a grasp and pick policy. The optimisation loop will sample deformations, call APAP, assemble the objects and evaluate them using the grasp simulation. Set sim.policy.task: GraspTaskPick. A checkpoint is not required for this task because the weights for Contact Graspnet are automatically loaded by the task script. The sim.objectives list should include "Grasp Success" with corresponding weights and directions. Outputs mirror the previous tasks.

Configuration Overview

Each entry point has a dedicated configuration file under conf/, named to match the Python entry point:

conf/cem_drawer.yaml – for drawer opening
conf/cem_manipulation.yaml – for insertion (state-based or point-cloud via +sim.policy.pcd_policy=True)
conf/cem_grasping.yaml – for grasping

All three share the same schema for io, sim, and hparams. You will typically only need to edit:

Paths in io: point to your prepared assets and output directories.
sim.policy.checkpoint: set to the trained policy checkpoint (if required for the task).
Objectives in sim.objectives: use the defaults provided in each YAML unless you have custom evaluation goals.

Example (conf/cem_manipulation.yaml):

io:
  source_mesh_file: "mesh.obj"
  apap_output_dir: "~/Documents/data/apap_data/usb"
  apap_out_root: "~/Documents/data/apap_outputs" # output directories for mesh deformations
  parquet_dataset: "~/Documents/data/parquet_dataset"

sim:
  objectives: ["Insertion Success"]
  objective_weights: [1.0]
  objective_directions: ["min"]
  policy:
    task: IndustRealTaskPegsInsert
    checkpoint: "/path/to/trained_policy.pth"

hparams:
  approach: topdm
  seed: 42
  max_iters: 10
  population_size: 10
  elite_frac: 0.4

Evaluation Red-Teaming Results

After running red‑teaming experiments, you can aggregate and analyse the results using the scripts in src/scripts/. Replace paths as appropriate for your environment.

Aggregate per‑object statistics:

python analyze_data.py \
  --parquet-dir ~/Documents/data/parquet_dataset \
  --baseline-file ../../results.json \
  --output-file ../../data_stats.jsonl

This script reads all Parquet files in --parquet-dir (produced during red‑teaming), compares them against a JSON of baseline success rates (--baseline-file, one object per entry) and writes aggregated statistics to --output-file. The baseline JSON should contain nominal success rates and shape complexities for each object; consult our evaluation protocol for its exact format.

Compute high‑level metrics:

python metrics.py ../../data_stats.jsonl

This prints the mean area under the degradation curve, final drop in success rate, number of iterations until the first catastrophic failure, and median Δshape complexity across all objects in the JSONL file.

Render mesh GIFs:

python render_all.py <path_to_mesh_dir> <output_dir>

This renders turntable GIFs for all .obj files in <path_to_mesh_dir> using Open3D and writes them to <output_dir>. Use this for qualitative inspection of CrashShapes.

Quick Tour

The repository is organised as follows:

conf/ – Hydra configuration files for all entrypoints.
src/ – Python source code. Key entry points live in src/cem_drawer.py, src/cem_manipulation.py and src/cem_grasping.py. The core optimisation logic is in src/core/optimization/ and src/core/orchestration/. Evaluators for different environments are under src/core/evaluation/.
src/scripts/ – analysis and utility scripts. Notable ones include analyze_data.py, metrics.py, render_all.py for post‑processing; mesh_utils.py and shape_complexity.py for mesh manipulation; and prepare_dataset.py for asset preparation using the VLM pipeline.
src/apap/ and src/isaacgym/ – git submodules providing the deformation and simulation back‑ends. These are external repositories; follow their documentation for installation and testing.

FAQ

What compute do I need to run the geometric red-teaming pipeline?

Short answer: a modern NVIDIA GPU with at least 24 GB of VRAM is recommended.

Details:

Reference setup used for the paper: four NVIDIA RTX A6000 GPUs.
Single-GPU configurations verified: RTX 4090 and RTX 3090.
Architectural compatibility: tested on modern NVIDIA architectures with SM 86+.

Guidance:

A single 24 GB+ GPU is sufficient to run the code and reproduce our experiments at the recommended scales.
Multi-GPU setups improve throughput and wall-clock time but are not required for correctness.
If you intend to match the scale of our reported runs, plan for resources comparable to the reference setup above.

Acknowledgements

We thank the authors of the following repositories for open-sourcing their code, which our codebase is built upon:

APAP: https://as-plausible-as-possible.github.io/
IsaacGym: https://github.com/isaac-sim/IsaacGymEnvs/tree/automate

Cite

If you build on this work, please cite the associated CoRL 2025 paper.

@misc{goel2025geometricredteamingroboticmanipulation,
      title={Geometric Red-Teaming for Robotic Manipulation}, 
      author={Divyam Goel and Yufei Wang and Tiancheng Wu and Guixiu Qiao and Pavel Piliptchak and David Held and Zackory Erickson},
      year={2025},
      eprint={2509.12379},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2509.12379}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
conf		conf
docs		docs
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Geometric Red‑Teaming for Robotic Manipulation

Build from Source

Submodules

Quick Intro

Asset Preparation and VLM‑based Keypoint Extraction

Red-Teaming With Different Task & Policy Suites

Red‑teaming state‑based drawer opening policy

Red‑teaming state‑based insertion policy

Red‑teaming PCD‑based insertion policy

Red‑teaming grasping policy

Configuration Overview

Evaluation Red-Teaming Results

Quick Tour

FAQ

What compute do I need to run the geometric red-teaming pipeline?

Acknowledgements

Cite

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

RCHI-Lab/GRT

Folders and files

Latest commit

History

Repository files navigation

Geometric Red‑Teaming for Robotic Manipulation

Build from Source

Submodules

Quick Intro

Asset Preparation and VLM‑based Keypoint Extraction

Red-Teaming With Different Task & Policy Suites

Red‑teaming state‑based drawer opening policy

Red‑teaming state‑based insertion policy

Red‑teaming PCD‑based insertion policy

Red‑teaming grasping policy

Configuration Overview

Evaluation Red-Teaming Results

Quick Tour

FAQ

What compute do I need to run the geometric red-teaming pipeline?

Acknowledgements

Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages