Skip to content

Conversation

@SBCV
Copy link
Contributor

@SBCV SBCV commented Sep 3, 2020

When running an experiment with some satellite images, I observed that the fill_calibration and fill_inverse_calibration methods are (probably) incorrect.
I tried to compute a dense point cloud using some depth maps imported with the recently added Colmap-Importer. The fused result showed that the triangulation of the different depth maps have a huge displacement. The following image shows an example. The mesh in green is reconstructed with Colmap and serves as reference.

fused_result_incorrect_with_mesh

The reason for that is that the pinhole camera models used to approximate the pusbroom cameras of satellites are very special. They potentially have dramatically different focal length values fx and fy as well as strange principal points cx and cy. Therefore, they are suitable for testing the pipeline for cases with extreme values. In addition, already small errors in the camera models / poses result in large reconstruction point cloud errors, since the satellites are thousands of kilometers away from the reconstructed point cloud.

Note, when running the same experiments with "standard" camera models (i.e. fx almost identical to fy, cx and cy close to the image center), I did not observe cases like this.

The following image shows the fused point cloud of all 50 cameras using the changes of this PR.
fused_result_correct

The next image shows an overlay with the green mesh. As one can see, the differences are very small (i.e. the pose and shape of the fused point cloud and the mesh are almost identical)
fused_result_correct_with_mesh

Here, a final textured mesh using the the different pipeline components of MVE except of the Structure from Motion and the Depth Map computation (for both steps it is necessary to use this Colmap fork to address specific properties of satellite images).

textured_mesh

I'm aware that this is a fundamental change, but I guess that the experiments above nicely show that this change is correct. Any second thoughts on this?

@SBCV
Copy link
Contributor Author

SBCV commented Sep 3, 2020

Maybe one additional note: Why this did not appear earlier? I think in many situation the case differentiation in camera.cc does not matter. This only becomes apparent when the effect of paspect is actually bigger than dim_aspect, which is almost never the case (for standard camera models).

@flanggut
Copy link
Collaborator

flanggut commented Sep 3, 2020

Yes I think we very rarely, if at all, had the case where paspect wasn't 1. To me the changes look reasonable. If we have paspect as fy / fx this is definitely the correct way to compute the calibration matrices.

@SBCV
Copy link
Contributor Author

SBCV commented Sep 3, 2020

If you check the original implementation of fill_calibration and fill_inverse_calibration in camera.cc, you see that both functions use

    if (image_aspect < 1.0f) /* Portrait. */
    {
        ax = this->flen * height / this->paspect;
        ay = this->flen * height;
    }
    else /* Landscape. */
    {
        ax = this->flen * width;
        ay = this->flen * width * this->paspect;
    } 

If one substitutes this->flen * height in the if-case, one can see that ax = ay / this->paspect <=> this->paspect = ay / ax. Similarily, if one substitutes this->flen * width in the else-case, one can see that ay = ax * this->paspect <=> this->paspect = ay / ax. Thus, I'm pretty sure that paspect = ay / ax is correct.

@simonfuhrmann
Copy link
Owner

simonfuhrmann commented Sep 3, 2020

I am a little bit more skeptical about this change. It is correct that we rarely have to deal with pixel aspect rations != 1. But we have tested datasets with pixel aspect ratios != 1, for example, the Strecha datasets, and when dataset came out that's also when we implemented this code.

The reason why I'm skeptical is because what Sebastian uses as empirical proof, while it does look convincing, does come from a different SfM and stereo system, which may use a different convention to what we use. So I would at least like to try to reconstruct the fountain with both implementations (the baseline and this PR), and see what the results look like.

Problem is, the ground truth camera parameters for the image seem to be gone?

Edit: @pmoulon seems to have this dataset in his repo. But WHO KNOWS what the format of the camera files is.

@simonfuhrmann
Copy link
Owner

In MVE conventions, if I remember correctly, the focal length corresponds to the (inverse of the) field of view of the longer side of the image. And longer side doesn't mean more pixels, but it means the image plane is wider than tall. And that has to take the aspect ratio of the pixels into account, hence we have code image_aspect = (width / height) * (pwidth / pheight) = (width * pwidth) / (height * pheight), which seems reasonable to me. The code in this pull request doesn't take this into consideration. Am I missing something?

Here, width/height is the image dimensions in pixels, and pwidth/pheight is the pixel size in some (irrelevant) unit.

@flanggut
Copy link
Collaborator

flanggut commented Sep 3, 2020

Ok yeah, I think I remember. So if "longer side" is the convention then we need to take that into account.

@SBCV is right in that we always have paspect = ay / ax. We just don't compute ax as ax = flen * max(width, height) but instead do something like ax = width * pwidth > height * pheight ? flen * width : flen * height.

I guess this is not taken into account when the parameters are imported and that causes the issues?

@SBCV
Copy link
Contributor Author

SBCV commented Sep 3, 2020

`I am a little bit more skeptical about this change.

I absolutely understand that, since it affects such a basic component of the library.

So I would at least like to try to reconstruct the fountain with both implementations (the baseline and this PR), and see what the results look like.

Okay, I'll make a comparison tomorrow and post the results here.

We just don't compute ax as ax = flen * max(width, height) but instead do something like ax = width * pwidth > height * pheight ? flen * width : flen * height.

Before submitting this PR, I tried to modify the create_camera_info_from_params() function in bundle_io.cc - see the lines 547 - 557. Unfortunately, without success. However, I'm open for suggestions :D

@simonfuhrmann
Copy link
Owner

simonfuhrmann commented Sep 3, 2020

Question -- did you consider that our focal length is relative to max(width, height), because I can't see that here. For example, our focal length is focal_length_in_px / max(width, height), essentially to make the focal length independent of the image size. How is the focal length from Colmap defined?

Crazy idea, but I think this might actually be the issue you're running into, and the code modified in the PR might actually be correct. So, the values ax and ay we produce here are the un-normalized focal length values in pixels in x and y direction (perhaps the exact values you get from Colmap?), and you have to have a similar if in your code when you convert TO our model (essentially do something like the inverse to what we do here).

Does this make sense?

@simonfuhrmann
Copy link
Owner

simonfuhrmann commented Sep 3, 2020

Also, comparing our old code:

    float dim_aspect = width / height;
    float image_aspect = dim_aspect * this->paspect;
    float ax, ay;
    if (image_aspect < 1.0f) /* Portrait. */
    {
        ax = this->flen * height / this->paspect;
        ay = this->flen * height;
    }
    else /* Landscape. */
    {
        ax = this->flen * width;
        ay = this->flen * width * this->paspect;
    }

with your new code,

    size_t max_extent = std::max(width, height);
    float ax = this->flen * max_extent;
    float ay = ax * this->paspect;

the code is equivalent if the image is wider than tall, so evaluating it on the fountain dataset is probably not very useful. :(

Sidenote: max_element should be a float.

@SBCV
Copy link
Contributor Author

SBCV commented Sep 4, 2020

Question -- did you consider that our focal length is relative to max(width, height), because I can't see that here. For example, our focal length is focal_length_in_px / max(width, height), essentially to make the focal length independent of the image size.

I think that this is only true with paspect == 1, since in this case the if-else-condition test for dim_aspect < 1.0fand not for image_aspect < 1.0f.
Consider the case where dim_aspect=2 (i.e. width > height) and paspect=0.25. This yields image_aspect==0.5, thus the if-branch is executed and ax as well as ay are defined w.r.t. height. However, in this example, height is smaller than width. So in this case the statement

our focal length is relative to max(width, height)

is not true. And this also illustrates why I find the original if-else-condition a bit odd. We are testing for width * pwidth < height * pheight, but then we use only width and height to determine the final focal length. However, I'm still unsure if that is the actual problem ...

@SBCV
Copy link
Contributor Author

SBCV commented Sep 4, 2020

How is the focal length from Colmap defined?

I think that the focal lengths fx and fy in Colmap describe the physical focal length using the width and height of the pixels as unit. I thought that fx corresponds to the largest image dimension (i.e. max(height, width)).

@ahojnnes Could you shed some light on this? This would help a lot. Especially, it would be useful to know, if largest image dimension means max(height, width) or max(height * pheight, width * pwidth), where pheight and pwidth denote the pixel dimensions.

@SBCV
Copy link
Contributor Author

SBCV commented Sep 4, 2020

Crazy idea, but I think this might actually be the issue you're running into, and the code modified in the PR might actually be correct. So, the values ax and ay we produce here are the un-normalized focal length values in pixels in x and y direction (perhaps the exact values you get from Colmap?), and you have to have a similar if in your code when you convert TO our model (essentially do something like the inverse to what we do here).

Can you elaborate this a little bit. Like I said, before creating this PR, I tried already something like the following without success.

    else if (model == "PINHOLE")
    {

        // Pinhole: fx, fy, cx, cy
        float fx = params[0];
        float fy = params[1];
        float dim_aspect = width / height;
        float paspect = fy / fx;
        float image_aspect = dim_aspect * paspect;

        if (image_aspect < 1.0f) /* Portrait. */
            camera_info.flen = fy;
        else /* Landscape */
            camera_info.flen = fx;
        camera_info.paspect = paspect;
        camera_info.ppoint[0] = params[2] / width;
        camera_info.ppoint[1] = params[3] / height;
    }

@SBCV
Copy link
Contributor Author

SBCV commented Sep 4, 2020

I just realized that this issue is actually more complicated as expected...
In this function I use the same mechanism to convert the different semantics of Colmap depth maps to MVE depth maps.

@simonfuhrmann
Copy link
Owner

simonfuhrmann commented Sep 4, 2020

I just realized that this issue is actually more complicated as expected...
In this function I use the same mechanism to convert the different semantics of Colmap depth maps to MVE depth maps.

Converting the depth map conventions is relatively easy. For each pixel (u, v) you can compute a scaling factor by back-projecting to a 3d point from pixel (u,v) similar to here at depth 1, and then diving by the inverse of the z component. This way you can re-use the camera model code.

Actually we have a function for that here.

@simonfuhrmann
Copy link
Owner

simonfuhrmann commented Sep 4, 2020

Can you elaborate this a little bit.

Let me try. Because I don't see where you normalize fx or fy in your code with either width or height, I'd try this:

// Pinhole: fx, fy, cx, cy
float fx = params[0];
float fy = params[1];
float dim_aspect = static_cast<float>(width) / height;  // CAST!
float pixel_aspect = fy / fx;
float img_aspect = dim_aspect * pixel_aspect;
if (img_aspect < 1.0f) {
  camera_info.flen = fy / height;
} else {
  camera_info.flen = fx / width;
}
camera_info.paspect = pixel_aspect;
camera_info.ppoint[0] = params[2] / width;
camera_info.ppoint[1] = params[3] / height;

Edit:

I can see that MVE's code is not ideally structured, where parts of the conversion happen in bundle_io.h/cc and other parts, such as normalizing the focal length, happen in makescene.cc. I think the original idea was that bundle_io.h/cc was supposed to read the file format only, and conversions happen outside, this may have been a bad design.

In any case, when the normalization happens here, you have to skip the normalization in makescene.cc.

And that means your code and my code are really not that different except that I have a cast to compute the aspect ratio... :(

@SBCV
Copy link
Contributor Author

SBCV commented Sep 4, 2020

Let me try. ...

Thank you! I'll give it a try the next time I have access to my desktop PC - which will be on Monday.

@SBCV
Copy link
Contributor Author

SBCV commented Sep 7, 2020

Could confirm that the problem has been caused by the missing adjustment in convert_depth_map_semantics() (I did not consider the changes in the focal length there) and the incorrect normalization in makescene.cc

See #518 for the corresponding adjustments.

@SBCV SBCV closed this Sep 7, 2020
@SBCV SBCV deleted the fix_calibration branch September 9, 2020 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants