Releases · SenseTime-FVG/OpenDWM

CTSD pipeline supports action control (training data is converted from ego transform) and pyTorch distributed checkpoint.
- The distributed checkpointing reduces peak memory and GPU memory usage during the loading, resuming, saving stage, which is more friendly for low memory and GPU memory distributed system.
- Warning: Incompatible with the optimizer checkpoint of previous version). To resume the checkpoint saved by the previous version (<0.3.3), you need to launch a new training stage with the recently saved model checkpoint only.
Update LiDAR VQVAE and Maskgit pipelines for temporal and auto-regressive generation.
Release Diffusion Forcing Transformer (DFoT) on CTSD 3.5 model config and checkpoint. The DFoT is a kind of self-supervision target with "soft mask", which allow the model to reduce the accumulative degradation during the auto-regressive generation. The interative generation pipeline prefers a DFoT model.
Release KITTI-360 included LiDAR VQVAE and Maskgit model config and checkpoint.
Add scripts to make blank code (LiDAR VQVAE), make carla camera parameters (interactive generation), Carla control from steering (interactive generation)

Fixes

Fix export script as nuScenes data.
Other minor fixes about the models, datasets, metrics.

Assets 2

17 Mar 06:50

wzhgba

v0.3.1

a59b33e

v0.3.1

Functionality

Add experimental interactive generation pipeline code, config, documents.
CTSD pipeline support action (speed, steering) as additional conditions.

Fix & minor updates

Fix dataset text condition bug for random text partial drop, (triggered by the seed being given in the image_description_settings).
Update dependencies for the safety issue.
Update metric record for released LiDAR models.

Assets 2

07 Mar 14:48

wzhgba

v0.3.0

2f9506b

v0.3.0

Release LiDAR generation models (LiDAR VQVAE, LiDAR maskgit, reproducing the UltraLiDAR), training code, examples, metrics.
- LiDAR VQVAE is trained on nuScenes, Waymo, Argoverse
- LiDAR Maskgit is trained on nuScenes controlled by layout condition.

Assets 2

04 Mar 01:49

wzhgba

v0.2.1

2420aa0

v0.2.1

Functionality

Release layout conditioned CTSD 3.5 config, checkpoint, example.
- Trained with 4 datasets (nuScenes, Waymo, Argoverse, OpenDV) in single stage for better generalization ability.
- Bucket for multi-resolution, following OpenSora ...
nuScenes dataset allow to config BEV condition drawing by solid shape or outline.

Fix

Fix that S3FS cannot list more than 1k items.
Fix reshape error for CTSD 3.x pointwise temporal attention.
Fix link to UniMLVG config in the README.

Assets 2

03 Mar 06:40

wzhgba

v0.1.1

9fd62f9

v0.1.1

Release UniMLVG config, and checkpoints.
Support CTSD model loaded in fp16, and text encoders loaded with bitsandbytes quantization config (8bit, 4bit).

Assets 2

Releases: SenseTime-FVG/OpenDWM

v0.6.0

Uh oh!

v0.5.1

Uh oh!

v0.5.0

Uh oh!

v0.4.0

Uh oh!

v0.3.3

Uh oh!

v0.3.1

Uh oh!

v0.3.0

Uh oh!

v0.2.1

Uh oh!

v0.1.1

Uh oh!