Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

add embedding learning example #9165

Merged
merged 3 commits into from
Jan 31, 2018
Merged

add embedding learning example #9165

merged 3 commits into from
Jan 31, 2018

Conversation

chaoyuaw
Copy link
Contributor

Description

Add gluon example for embedding learning.

Checklist

Essentials

  • [v] Passed code style checking (make lint)
  • [v] Changes are complete (i.e. I finished coding on this PR)
  • [v] All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • [v] Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • [v] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Add embedding learning example.

Comments

  • More datasets to come later.

@chaoyuaw
Copy link
Contributor Author

Just sent a PR to web-data for the image (dmlc/web-data#40).
Will update the path and remove the image here once that's merged. Thank you!

@@ -0,0 +1,72 @@
# Image Embedding Learning

This example implements embedding learning based on a Margin-based Loss with distance weighted sampling [(Wu et al, 2017)](http://www.philkr.net/papers/2017-10-01-iccv/2017-10-01-iccv.pdf). The model obtains a validation Recall@1 of ~64% on the [Caltech-UCSD Birds-200-2011](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does performance match the original paper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, approximately (slightly higher recall@1 and slightly lower recall@16 than what's reported in the paper). The difference is < 1%.

The difference between this implementation and the original implementation is that here we perform sampling within each GPU while the original paper implements cross-gpu sampling. Since the performance is almost identical (at least on this dataset), I use per-gpu sampling here for simplicity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaoyuaw Thanks for the contribution! Do you mind noting the difference with the original paper (per-gpu sampling) explicitly in readme? That will be good information to know when others reuse the example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the good suggestion! Yes, I will update this soon.

@cjolivier01
Copy link
Member

LGTM

@cjolivier01 cjolivier01 merged commit 4957c7c into apache:master Jan 31, 2018
try:
n_indices += np.random.choice(n, k-1, p=np_weights[i]).tolist()
except:
n_indices += np.random.choice(n, k-1).tolist()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaoyuaw It seems that you also sample positive elements into the n_indices here. If I understand correctly the exception can occur if all negative partners induce non-zero loss, meaning that there are none left to sample from due to the (distance < self.nonzero_loss_cutoff) mask. So shouldn't you sample some of the negative elements that induce zero loss instead of possibly positive elements here? Please correct me if I misunderstood something. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for pointing this out! Yes, you're absolutely right. I'll fix this soon.

Outputs:
- Loss.
"""
def __init__(self, margin=0.2, nu=0.0, weight=None, batch_axis=0, **kwargs):
Copy link
Contributor

@leezu leezu Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaoyuaw Is ν=0 the same as what you use for the results with learned β^class in Table 3? Thanks for clarifying!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @leezu, thanks for your question! A \nu == 0 doesn't disable learning of \beta^{class}. \nu is the regularization hyperparameter, or you can think of it as setting a prior for beta. In experiments, I found \nu==0 usually works well. Setting --lr-beta=0 will however disable learning of \beta^{class}. I hope that clarifies!

rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018
* add embedding learning example

* image from web-data

* fix typos
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
* add embedding learning example

* image from web-data

* fix typos
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants