Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mEstimator Error with Trust Region Optimizers #1129

Closed
DanMcGann opened this issue Mar 9, 2022 · 6 comments · Fixed by #1161
Closed

mEstimator Error with Trust Region Optimizers #1129

DanMcGann opened this issue Mar 9, 2022 · 6 comments · Fixed by #1161
Milestone

Comments

@DanMcGann
Copy link
Contributor

Description

When using robust noise models trust region optimizers use an incorrect value for system error when evaluating convergence and step rejection criteria. Specifically these optimizer evaluate convergence on the sum of weighted residuals formula rather than the robust cost value formula. Where I have used the same notation style as the robust estimator reference provided in the documentation (LINK) but the exact form computed by GTSAM. Plugging in the definition for the weight we can quickly see the inequality of these terms for many mEstimator forms.

For example Cauchy (c=1 scale value for brevity)
formula plugging in form of Cauchy loss we get formula

The discrepancy can greatly affect determination of convergence and step acceptance by our trust region optimizers, especially for the mEstimators that are asymptoticly constant like Geman McClure (see below). For these estimators the system error (in its current implementation) can increase as the estimate gets closer to the mean meaning steps will be rejected by LM or DogLeg and the system will preemptively converge.

Steps to reproduce

  1. Implement factor with a robust noise model
  2. Plot its error wrt its loss.
import gtsam
import numpy as np
import matplotlib.pyplot as plt


def makeValues(x):
    values = gtsam.Values()
    values.insert(0, x)
    return values


def cauchy_loss(x):  # c=1
    return 0.5 * np.log(1 + x ** 2)


def gm_loss(x):
    return (0.5 * x ** 2) / (1 + x ** 2)


# Constants
x = np.arange(-10, 10, 0.1)
noise_model = gtsam.noiseModel.Isotropic.Sigma(1, 1)


# Cauchy
mEst = gtsam.noiseModel.mEstimator.Cauchy(1.0)
robust_model = gtsam.noiseModel.Robust(mEst, noise_model)
factor = gtsam.PriorFactorDouble(0, 0, robust_model)

error = [factor.error(makeValues(xi)) for xi in x]
loss = cauchy_loss(x)
plt.title("Cauchy")
plt.plot(x, error, label="FactorError")
plt.plot(x, loss, label="mEstimator Loss")
plt.legend()
plt.show()


mEst = gtsam.noiseModel.mEstimator.GemanMcClure(1.0)
robust_model = gtsam.noiseModel.Robust(mEst, noise_model)
factor = gtsam.PriorFactorDouble(0, 0, robust_model)

error = [factor.error(makeValues(xi)) for xi in x]
loss = gm_loss(x)
plt.title("Geman McClure")
plt.plot(x, error, label="FactorError")
plt.plot(x, loss, label="mEstimator Loss")
plt.legend()
plt.show()

bug_report_cauchy
bug_report_gm

Expected behavior

I would expect that when evaluating system error optimizers would use the actual robust cost, rather than the value of the weighted squared residual. Parallel, I would expect to see better convergence of trust region optimizers than empirically found. For emperical evaluation of convergence see script below.

Convergence Evaluation Script

import gtsam
from gtsam import noiseModel
import matplotlib.pyplot as plt
import numpy as np

OPTIMIZERS = [
    gtsam.GaussNewtonOptimizer,
    gtsam.LevenbergMarquardtOptimizer,
    gtsam.DoglegOptimizer,
]
OPTIM_PARAMS = [
    gtsam.GaussNewtonParams,
    gtsam.LevenbergMarquardtParams,
    gtsam.DoglegParams,
]
OPTIM_NAMES = ["GaussNewton", "LM", "DOGLEG"]

POSES = [
    gtsam.Point2(0, 0),
    gtsam.Point2(1, 0),
    gtsam.Point2(1, 1),
    gtsam.Point2(0, 1),
]

INITIAL_ESTS = [
    gtsam.Point2(-5, -5),
    gtsam.Point2(6, -5),
    gtsam.Point2(6, 6),
    gtsam.Point2(-5, 6),
]

M_ESTS = {
    "NULL": lambda x: noiseModel.mEstimator.Null(),
    "Huber": noiseModel.mEstimator.Huber,
    "Cauchy": noiseModel.mEstimator.Cauchy,
    "Welsch": noiseModel.mEstimator.Welsch,
    "GemanMcClure": noiseModel.mEstimator.GemanMcClure,
}

RHO_FUNCS = [
    lambda r: 0.5 * r ** 2,
    lambda r: r ** 2 if np.abs(r) < 1 else np.abs(r),
    lambda r: np.log(1 + r ** 2),
    lambda r: (1 - np.exp(-(r ** 2))),
    lambda r: (r ** 2) / (1 + r ** 2),
]

for (mest_name, mest), rho in zip(M_ESTS.items(), RHO_FUNCS):
    graph = gtsam.NonlinearFactorGraph()
    graph.add(
        gtsam.PriorFactorPoint2(
            0,
            POSES[0],
            noiseModel.Robust(mest(1), noiseModel.Isotropic.Sigma(2, 1)),
        )
    )

    for i in range(1, len(POSES)):
        graph.add(
            gtsam.BetweenFactorPoint2(
                i - 1,
                i,
                -POSES[i - 1] + POSES[i],
                noiseModel.Robust(
                    mest(1),
                    noiseModel.Isotropic.Sigma(2, 1),
                ),
            )
        )
    values = gtsam.Values()
    for i, pose in enumerate(INITIAL_ESTS):
        values.insert(i, pose)

    fig = plt.figure(constrained_layout=True)
    ax_dict = fig.subplot_mosaic(
        """
        ABC
        DDD
        """
    )
    print(ax_dict)
    fig.suptitle(mest_name)
    init_pts = np.stack(INITIAL_ESTS)

    gt_pts = np.stack(POSES)
    print(mest_name)
    print(
        "------------------------------------------------------------------------------------------"
    )
    for opt_class, param_class, name, ax_name in zip(
        OPTIMIZERS, OPTIM_PARAMS, OPTIM_NAMES, ["A", "B", "C"]
    ):
        print(name)
        ax = ax_dict[ax_name]
        params = param_class()
        params.setVerbosity("ERROR")
        optimizer = opt_class(graph, values, params)
        result = optimizer.optimize()
        print("")
        pts = []
        for i in range(len(POSES)):
            pts.append(result.atPoint2(i))
        pts = np.stack(pts)
        ax.set_title(name)
        ax.plot(init_pts.T[0], init_pts.T[1], marker="o", label="Initialization")
        ax.plot(gt_pts.T[0], gt_pts.T[1], marker="o", label="GroundTruth")
        ax.plot(pts.T[0], pts.T[1], marker="o", label=name, alpha=0.6)
        ax.set_aspect("equal")

    prior = gtsam.PriorFactorDouble(
        0, 0, gtsam.noiseModel.Robust(mest(1), gtsam.noiseModel.Isotropic.Sigma(1, 1))
    )
    gtsam_vals = []
    expected_vals = []
    x = np.arange(-10, 10, 0.1)
    for xi in x:
        vals = gtsam.Values()
        vals.insert(0, xi)
        gtsam_vals.append(prior.error(vals))
        expected_vals.append(rho(xi))
    ax_dict["D"].set_ylim(0, 3)
    ax_dict["D"].plot(x, gtsam_vals, label="GTSAM Error")
    ax_dict["D"].plot(x, expected_vals, label="Expected Error")
    ax_dict["D"].legend()
    plt.savefig("mest_{}.png".format(mest_name))
plt.show()

Environment

  • OS: Linux
  • Language: Python and C++
  • Version: 4.1.1 (built from source)

Additional information

I was tempted to just submit a PR with a fix, as this is a fairly small (though important bug). However, due to implementation details it is not a trivial fix. It could be fixed, by having two functions in the factor interface one used for system linearization, the other used for evaluating system error. However, this would be redundant for most cases. Alternatively, would be to change the entire implementation scheme for robust estimators, having them wrap factors rather than noiseModels, such that weighting can be applied during linearization, but error is still evaluated using the robust cost. This however, would be a massive interface change. So I opted to just submit a bug report.

@ProfFan
Copy link
Collaborator

ProfFan commented Apr 7, 2022

@yetongumich may have some insights here? I am not really sure as last time I looked at robust is about 2 years ago...

@yetongumich
Copy link
Contributor

@yetongumich may have some insights here? I am not really sure as last time I looked at robust is about 2 years ago...

Let's do a quick chat about this tonight?

@ProfFan
Copy link
Collaborator

ProfFan commented Apr 7, 2022

Sure

@yetongumich may have some insights here? I am not really sure as last time I looked at robust is about 2 years ago...

Let's do a quick chat about this tonight?

@ProfFan
Copy link
Collaborator

ProfFan commented May 3, 2022

I actually stumbled upon this yesterday doing some robust SfM stuff. Yes this is indeed annoying....

@jlblancoc
Copy link
Member

@DanMcGann Thanks for the detailed report and analysis!

It could be fixed, by having two functions in the factor interface one used for system linearization, the other used for evaluating system error. However, this would be redundant for most cases.

You have all the details fresher... would you mind to quickly write down (just as a comment here) what two functions are you thinking of? Then we probably might find a way to make it less redundant via a virtual method or alike...

@DanMcGann
Copy link
Contributor Author

what two functions are you thinking of?

My initial thoughts were to have

  1. A function that returns formula that would be used to populate the b vector during linearization.
  2. A function that returns formula

However, taking a look at the code again, I cannot figure out how I previously came to the conclusion that this would be necessary.

  • NoiseModelFactor::linearize computes the unwhitened b vector and later whitens (never makes use of factor.error)
  • NoiseModelFactor::error already call the noise model loss (for robust this will be \rho)

This fresh look leads me towards the same fix that got proposed in #1161.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants