Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wide-deep hyperdrive notebook Azureml API update #847

Merged
merged 10 commits into from
Jul 9, 2019

Conversation

loomlike
Copy link
Collaborator

Description

AzureML API update fix

Related Issues

#842

Checklist:

  • I have followed the contribution guidelines and code style for this project.
  • I have added tests covering my contributions.
  • I have updated the documentation accordingly.

@review-notebook-app
Copy link

Check out this pull request on ReviewNB: https://app.reviewnb.com/microsoft/recommenders/pull/847

You'll be able to see visual diffs and write comments on notebook cells. Powered by ReviewNB.

@yueguoguo
Copy link
Collaborator

yueguoguo commented Jun 24, 2019

@loomlike I re-run the unit-test build and it passed. Can you trigger it again?

@@ -7,7 +7,7 @@
"<i>Copyright (c) Microsoft Corporation. All rights reserved.<br>\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any thoughts on why these values dropped so much?


also, i think using getattr is a bit obscure, it would be clearer to just call the metrics code directly


Reply via ReviewNB

@@ -7,7 +7,7 @@
"<i>Copyright (c) Microsoft Corporation. All rights reserved.<br>\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also what changed here to impact the metrics? are we not setting a random seed when training the model?


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The w&d notebook does use the default seed unless we explicitly set None.
Will test again and see if it reports the same number again (should be).

Use Steps instead of Epochs as most TF examples do.
Add model export fn and input-fn-for-saved-model fn.
Update notebooks and tests accordingly
Copy link
Collaborator

@gramhagen gramhagen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just one question on steps vs epochs

tests/integration/test_notebooks_gpu.py Show resolved Hide resolved
Copy link
Collaborator

@miguelgfierro miguelgfierro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@gramhagen gramhagen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question on path creation in tests

assert wide_columns[2].hash_bucket_size == 10
# Check model type
model = build_model(
str(tmp_path / ('wide_' + MODEL_DIR)), wide_columns=wide_columns
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this does?
str(tmp_path / ('wide_' + MODEL_DIR))

why is there a division operator here? should it be os.path.join(str(tmp_path), 'wide_' + MODEL_DIR), would this be causing the problem with windows path cleanup?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gramhagen when I use our fixture (tmp) for temporary directory, it causes error (only) on Windows DSVM. So I end up using pytest's fixture tmp_path witch returns Path object. And the codes use Path object's syntax /.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, the error is -- somehow Windows' DSVM holds TF model checkpoints folder and thus TemporaryDirectory cannot cleanup the folder by the time the test has done.

pytest's fixture tmp_path keeps the folder for last three test sessions instead of removing it right away. That's how it avoids the cleanup error. Not a clean solution but at least it is working for now for Windows testing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see, I wasn't familiar with that syntax. Do you think there is something that needs to be done to close the tf session or something in order to release the tmp path for cleanup?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the best thing to do is keep it as is and file an issue for using the normal yield fixture with TF

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried del model and no luck. Kind of hard to debug because that is happening only from Win-DSVM. @roalexan Any idea about this? FYI no issues when use local Windows machines (both laptop and workstation). Does Win-DSVM handle processes and folder read/write permission differently from local Windows'?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should just be Windows Server 2016 - don't see how it would be different than local. Link to product page. Maybe we could use one of these email aliases to ask a question about this: dsvm-product-team@ or Dsvm-interest@.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat related issue found from Tensorflow github issues: tensorflow/tensorflow#20606

I will try to force close tf's FileWriter and see if that solves this issue on Windows DSVM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok it does solve. I'm working on adding those changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, feel free to merge if everything passes

@loomlike loomlike merged commit 279bdb2 into staging Jul 9, 2019
@loomlike loomlike deleted the jumin/w-d-hyperdrive branch July 9, 2019 14:18
yueguoguo pushed a commit that referenced this pull request Sep 9, 2019
* AzureML api update
* Wide-deep and tf-utils update
  - Use Steps instead of Epochs as most TF examples do.
  - Add model export fn and input-fn-for-saved-model fn.
  - Update notebooks and tests accordingly
* Simplify wd test
* Simplify deeprec and ncf unit tests
* Safely close tf event file in tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants