Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train-test flow scale mismatch when using SSF model #307

Open
sybahk opened this issue Sep 10, 2024 · 0 comments
Open

Train-test flow scale mismatch when using SSF model #307

sybahk opened this issue Sep 10, 2024 · 0 comments

Comments

@sybahk
Copy link

sybahk commented Sep 10, 2024

Hi,
I think there is an issue on the SSF model implementation which prevents model to get appropriate R-D result.
In the code,

def warp_volume(self, volume, flow, scale_field, padding_mode: str = "border"):
"""3D volume warping."""
if volume.ndimension() != 5:
raise ValueError(
f"Invalid number of dimensions for volume {volume.ndimension()}"
)
N, C, _, H, W = volume.size()
grid = meshgrid2d(N, C, H, W, volume.device)
update_grid = grid + flow.permute(0, 2, 3, 1).float()
update_scale = scale_field.permute(0, 2, 3, 1).float()
volume_grid = torch.cat((update_grid, update_scale), dim=-1).unsqueeze(1)
out = F.grid_sample(
volume.float(), volume_grid, padding_mode=padding_mode, align_corners=False
)
return out.squeeze(2)
estimated flow is directly used to F.grid_sample function.
It should have no problem when grid is in the absolute coordinate, but actually grid is in relative coordinate, which left-top corner is [-1, -1] and right-bottom corner is [1, 1].

This introduces problematic behavior, because in the training condition, flow is estimated on (256, 256) patches, but in the testing condition, it is estimated on (1152, 1920) frames(regarding paddings), resulting much larger flow than expected.

A quick workaround is to apply some weightings regarding train-test size changes, like:
sybahk@df138a9

And I ran evaluation using this command :
python3 -m compressai.utils.video.eval_model pretrained $UVG_PATH outputs -a ssf2020 -q 1,2,3,4 -o ssf2020-mse-ans-vimeo-modified.json
sybahk@b9f5610

Applying this workaround, model's R-D curve goes much higher than before, showing similar result with authors'.
python3 -m compressai.utils.video.plot -f results/video/UVG-1080p/ssf* -o outputs/fig.png
fig
(ssf2020-mse is the one that used the workaround.)

When using pretrained model, applying the workaround is just fine, but when training a new model, I think we should consider input size from training time like DCVC does:
https://github.com/microsoft/DCVC/blob/4df94295c8dbe0a26456582d1a0eddb3465f1597/DCVC-TCM/src/models/video_net.py#L83-L94

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant