Train-test flow scale mismatch when using SSF model #307

sybahk · 2024-09-10T07:26:33Z

Hi,
I think there is an issue on the SSF model implementation which prevents model to get appropriate R-D result.
In the code,

CompressAI/compressai/models/video/google.py

Lines 354 to 371 in 743680b

    
           def warp_volume(self, volume, flow, scale_field, padding_mode: str = "border"): 
        
               """3D volume warping.""" 
        
               if volume.ndimension() != 5: 
        
                   raise ValueError( 
        
                       f"Invalid number of dimensions for volume {volume.ndimension()}" 
        
                   ) 
        
               N, C, _, H, W = volume.size() 
        
               grid = meshgrid2d(N, C, H, W, volume.device) 
        
               update_grid = grid + flow.permute(0, 2, 3, 1).float() 
        
               update_scale = scale_field.permute(0, 2, 3, 1).float() 
        
               volume_grid = torch.cat((update_grid, update_scale), dim=-1).unsqueeze(1) 
        
               out = F.grid_sample( 
        
                   volume.float(), volume_grid, padding_mode=padding_mode, align_corners=False 
        
               ) 
        
               return out.squeeze(2)

estimated flow is directly used to F.grid_sample function.
It should have no problem when grid is in the absolute coordinate, but actually grid is in relative coordinate, which left-top corner is [-1, -1] and right-bottom corner is [1, 1].

This introduces problematic behavior, because in the training condition, flow is estimated on (256, 256) patches, but in the testing condition, it is estimated on (1152, 1920) frames(regarding paddings), resulting much larger flow than expected.

A quick workaround is to apply some weightings regarding train-test size changes, like:
sybahk@df138a9

And I ran evaluation using this command :
python3 -m compressai.utils.video.eval_model pretrained $UVG_PATH outputs -a ssf2020 -q 1,2,3,4 -o ssf2020-mse-ans-vimeo-modified.json
sybahk@b9f5610

Applying this workaround, model's R-D curve goes much higher than before, showing similar result with authors'.
python3 -m compressai.utils.video.plot -f results/video/UVG-1080p/ssf* -o outputs/fig.png

(ssf2020-mse is the one that used the workaround.)

When using pretrained model, applying the workaround is just fine, but when training a new model, I think we should consider input size from training time like DCVC does:
https://github.com/microsoft/DCVC/blob/4df94295c8dbe0a26456582d1a0eddb3465f1597/DCVC-TCM/src/models/video_net.py#L83-L94

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train-test flow scale mismatch when using SSF model #307

Train-test flow scale mismatch when using SSF model #307

sybahk commented Sep 10, 2024

Train-test flow scale mismatch when using SSF model #307

Train-test flow scale mismatch when using SSF model #307

Comments

sybahk commented Sep 10, 2024