Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added an option to flatten_dims in RobustCostFunction. #503

Merged
merged 5 commits into from
May 8, 2023

Conversation

luisenp
Copy link
Contributor

@luisenp luisenp commented Apr 28, 2023

Closes #497

@luisenp luisenp requested a review from fantaosha April 28, 2023 14:26
@luisenp luisenp self-assigned this Apr 28, 2023
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 28, 2023
@luisenp luisenp mentioned this pull request Apr 28, 2023
11 tasks
Copy link
Contributor

@fantaosha fantaosha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some minor comments on how to deal with rescaling.

squared_norm = torch.sum(weighted_error**2, dim=1, keepdim=True)
error_loss = self.loss.evaluate(squared_norm, self.log_loss_radius.tensor)

if self.flatten_dims:
return (error_loss.reshape(-1, self.dim()) + RobustCostFunction._EPS).sqrt()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if something like (error_loss.reshape(-1, self.dim()).sum(dim=-1, keepdims=True) + RobustCostFunction._EPS).sqrt() is better and more consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this change the result? The test I added checks that the current behavior is the same as flatten_dims=False for separate cost functions, which is what we want.

squared_norm = torch.sum(weighted_error**2, dim=1, keepdim=True)
rescale = (
self.loss.linearize(squared_norm, self.log_loss_radius.tensor)
+ RobustCostFunction._EPS
).sqrt()

return [
rescaled_jacobians = [
rescale.view(-1, 1, 1) * jacobian for jacobian in weighted_jacobians
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following my previous comments, this will be something like torch.einsum("ni, nij->nij", rescale.view(-1, self.dims()), jacobians)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as above.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With .diagonal, it presumes that the Jacobian matrix is a square matrix. And for a jacobian, it would be dE/dX, which determines its shape is (dim(E), dim(x)). My question is, will it be necessarily equal for dim(E) and dim(X)?

Copy link
Contributor Author

@luisenp luisenp Apr 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observation! I forgot to mention that this implementation assumes that we want to model N robust cost functions of dim=1, each with $k$ variables of dof=1, with a single robust cost function of dimension N and $k$ variables of dof=N. Basically, this assumes you want to use a single RobustCost object to model a problem of the form

$$J = \sum_{i=1:N} \rho(f_i(x_i, y_i, ...)^2)$$

Supporting cost function of dim > 1 and other combinations of variables dof gets messier, and I'm not sure it's worth the effort.

Note that this feature is mostly for programming convenience. The alternative of creating individual cost functions objects doesn't hurt performance, because TheseusLayer by defaults vectorizes cost functions with the same pattern so they are evaluated in a single call.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it seems it's not feasible for the case like in tutorial 4, motion model (First figure). In that case, the Error is the combination of 2 vectors, which could be usual and convenient in many case. Therefore, in my original code, I obtain the dimension of the input X through the Jacobian matrix. Because the error itself only contains the batch size and the dimension of the error. And I believe based on the chain rule the dRobustLoss/dOriLoss should be able to be multiplied by the original Jocobian directly.
image
image
image

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. You are right. Currently, I do not see an optimization problem that would result in where dim(E) != dim(X) or cannot be decoupled into multiple cost functions that fulfill dim(E) = dim(X). We can further discuss it if one encounters those problems. Thank you! I have learned a lot from your project!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, and thank you for all your useful feedback!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xphsean12 I have to apologize, because turns out I was wrong in some of the above! We definitely want to support the case were dim(error) != dim(X). For example, in the regression problem, dim(error) = N_points, but dim(X) = 1 both for A and B. In the above I was mixing up optim vars with aux vars.

Luckily, the fix is still pretty simple, and I have rewritten the test to check for the case above, using the same regression example. You can look at the last few commits if you are curious.

The code still assumes that flatten_dims acts as if it was dim(err) independent costs of dim=1.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback! By the way, for the radius, (maybe) I found a new issue. Since I do not have very concrete evidence, I am not going to raise an issue so far. In my previous commit, I change the radius = radius.exp() into .square(). @fantaosha mention that exp() is better for differentiating. However, the easiest approach should be just let radius = radius. And we just need to let the user input an radius >= 0.

The main difference is that (1) based on my observation, .exp() is much slower than radius = radius when I try to train the Huber radius (2) training an value, which will be further pass to the exp() function may be very difficult. Because when changing the trained value from 0 to 5, the actual radius being used is changed from 1 to 148. The exponential growth may cause training an accurate radius difficult. Because the radius will be enlarged by an exponential function. I am not sure whether my second concern is correct, just kind of thinking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fantaosha Any comments? I don't think using exp() should be particularly problematic. Maybe you need to lower your learning rate?

rescale.view(-1, 1, 1) * jacobian for jacobian in weighted_jacobians
], rescale * weighted_error
]
rescaled_error = rescale * weighted_error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line might also need changes as well

Copy link
Member

@mhmukadam mhmukadam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@luisenp luisenp merged commit 8e1139d into main May 8, 2023
@luisenp luisenp deleted the lep.robust_flatten_dims branch May 8, 2023 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Huber Loss Not Correct
5 participants