Skip to content

Releases: SenseTime-FVG/OpenDWM

v0.6.0

19 Jun 01:41
8829d64

Choose a tag to compare

v0.5.1

23 May 05:20
d35d325

Choose a tag to compare

Redirect the downloadable resources to the HuggingFace model and data repos, and we will will shutdown the old file server in future.

v0.5.0

20 May 05:04
1e2b25b

Choose a tag to compare

Functionality

  • Release temporal Maskgit of LiDAR generation

Fixes

  • Fix video streaming generation due to the code change of temporal VAE
  • Fix security warning of transformers

v0.4.0

06 May 04:24
6eb4914

Choose a tag to compare

Functionality

  1. Release the CTSD 3.5 with CogVideoX VAE for faster generation (1 latent for 4 frames).

Fix

  1. Fix the make Carla camera script about the rotation conversion for side back views.

v0.3.3

23 Apr 12:11
eab42d6

Choose a tag to compare

Functionality

  1. CTSD pipeline supports action control (training data is converted from ego transform) and pyTorch distributed checkpoint.
    • The distributed checkpointing reduces peak memory and GPU memory usage during the loading, resuming, saving stage, which is more friendly for low memory and GPU memory distributed system.
    • Warning: Incompatible with the optimizer checkpoint of previous version). To resume the checkpoint saved by the previous version (<0.3.3), you need to launch a new training stage with the recently saved model checkpoint only.
  2. Update LiDAR VQVAE and Maskgit pipelines for temporal and auto-regressive generation.
  3. Release Diffusion Forcing Transformer (DFoT) on CTSD 3.5 model config and checkpoint. The DFoT is a kind of self-supervision target with "soft mask", which allow the model to reduce the accumulative degradation during the auto-regressive generation. The interative generation pipeline prefers a DFoT model.
  4. Release KITTI-360 included LiDAR VQVAE and Maskgit model config and checkpoint.
  5. Add scripts to make blank code (LiDAR VQVAE), make carla camera parameters (interactive generation), Carla control from steering (interactive generation)

Fixes

  1. Fix export script as nuScenes data.
  2. Other minor fixes about the models, datasets, metrics.

v0.3.1

17 Mar 06:50
a59b33e

Choose a tag to compare

Functionality

  1. Add experimental interactive generation pipeline code, config, documents.
  2. CTSD pipeline support action (speed, steering) as additional conditions.

Fix & minor updates

  1. Fix dataset text condition bug for random text partial drop, (triggered by the seed being given in the image_description_settings).
  2. Update dependencies for the safety issue.
  3. Update metric record for released LiDAR models.

v0.3.0

07 Mar 14:48
2f9506b

Choose a tag to compare

  • Release LiDAR generation models (LiDAR VQVAE, LiDAR maskgit, reproducing the UltraLiDAR), training code, examples, metrics.
    • LiDAR VQVAE is trained on nuScenes, Waymo, Argoverse
    • LiDAR Maskgit is trained on nuScenes controlled by layout condition.

v0.2.1

04 Mar 01:49
2420aa0

Choose a tag to compare

Functionality

  • Release layout conditioned CTSD 3.5 config, checkpoint, example.
    • Trained with 4 datasets (nuScenes, Waymo, Argoverse, OpenDV) in single stage for better generalization ability.
    • Bucket for multi-resolution, following OpenSora ...
  • nuScenes dataset allow to config BEV condition drawing by solid shape or outline.

Fix

  • Fix that S3FS cannot list more than 1k items.
  • Fix reshape error for CTSD 3.x pointwise temporal attention.
  • Fix link to UniMLVG config in the README.

v0.1.1

03 Mar 06:40
9fd62f9

Choose a tag to compare

  1. Release UniMLVG config, and checkpoints.
  2. Support CTSD model loaded in fp16, and text encoders loaded with bitsandbytes quantization config (8bit, 4bit).