Skip to content

Commit

Permalink
[ML-7027] Document artifact store locations and update docstrings. (m…
Browse files Browse the repository at this point in the history
…lflow#1253)

Update documentation for new runs:/ scheme.
  • Loading branch information
Stephanie Bodoff authored and sueann committed Jun 3, 2019
1 parent b0c66b5 commit c09761e
Show file tree
Hide file tree
Showing 16 changed files with 210 additions and 161 deletions.
5 changes: 2 additions & 3 deletions docs/source/python_api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,13 @@
Python API
==========

The MLflow Python API is organized into the following modules. The most common functions are also
The MLflow Python API is organized into the following modules. The most common functions are
exposed in the :py:mod:`mlflow` module, so we recommend starting there.

.. toctree::
:glob:
:maxdepth: 1

*


See also an :ref:`index of all functions and classes<genindex>`.
See also the :ref:`index of all functions and classes<genindex>`.
58 changes: 47 additions & 11 deletions docs/source/tracking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@ MLflow Tracking

The MLflow Tracking component is an API and UI for logging parameters, code versions, metrics, and output files
when running your machine learning code and for later visualizing the results.
MLflow Tracking lets you log and query experiments using :ref:`Python <python-api>`, :ref:`REST <rest-api>`, :ref:`R-api`,
and :ref:`java_api` APIs.
MLflow Tracking lets you log and query experiments using :ref:`Python <python-api>`, :ref:`REST <rest-api>`, :ref:`R-api`, and :ref:`java_api` APIs.

.. contents:: Table of Contents
:local:
Expand Down Expand Up @@ -61,8 +60,7 @@ Where Runs Are Recorded
=======================

MLflow runs can be recorded to local files, to a SQLAlchemy compatible database, or remotely
to a tracking server.
By default, the MLflow Python API logs runs locally to files in an ``mlruns`` directory wherever you
to a tracking server. By default, the MLflow Python API logs runs locally to files in an ``mlruns`` directory wherever you
ran your program. You can then run ``mlflow ui`` to see the logged runs.

To log runs remotely, set the ``MLFLOW_TRACKING_URI`` environment variable to a tracking server's URI or
Expand Down Expand Up @@ -287,13 +285,46 @@ The UI contains the following key features:
Querying Runs Programmatically
==============================

All of the functions in the Tracking UI can be accessed programmatically. This makes it easy to do several common tasks:
You can access all of the functions in the Tracking UI programmatically. This makes it easy to do several common tasks:

* Query and compare runs using any data analysis tool of your choice, for example, **pandas**.
* Determine the artifact URI for a run to feed some of its artifacts into a new run when executing a workflow. For an example of querying runs and constructing a multistep workflow, see the MLflow `Multistep Workflow Example project <https://github.com/mlflow/mlflow/blob/15cc05ce2217b7c7af4133977b07542934a9a19f/examples/multistep_workflow/main.py#L63>`_.
* Load artifacts from past runs as :ref:`models`. For an example of training, exporting, and loading a model, and predicting using the model, see the MLFlow `TensorFlow example <https://github.com/mlflow/mlflow/tree/master/examples/tensorflow>`_.
* Run automated parameter search algorithms, where you query the metrics from various runs to submit new ones. For an example of running automated parameter search algorithms, see the MLflow `Hyperparameter Tuning Example project <https://github.com/mlflow/mlflow/blob/master/examples/hyperparam/README.rst>`_.

.. _artifact-locations:

Referencing Artifacts
---------------------

When you specify the location of an artifact in MLflow APIs, the syntax depends on whether you
are invoking the Tracking, Models, or Projects API. For the Tracking API, you specify the artifact location using a (run ID, relative path) tuple. For the Models and Projects APIs, you specify the artifact location in the follow ways:

- ``/Users/me/path/to/local/model``
- ``relative/path/to/local/model``
- ``<scheme>/<scheme-dependent-path>``. For example:

- ``s3://my_bucket/path/to/model``
- ``hdfs://<host>:<port>/<path>``
- ``runs:/<mlflow_run_id>/run-relative/path/to/model``

For example:

.. rubric:: Tracking API

.. code-block:: py
mlflow.tracking.log_artifacts("<mlflow_run_id>", "/path/to/artifact")
.. rubric:: Models API

.. code-block:: py
mlflow.pytorch.load_model("runs:/<mlflow_run_id>/run-relative/path/to/model")
.. _tracking_server:

Expand Down Expand Up @@ -336,9 +367,9 @@ For backwards compatibility, ``--file-store`` is an alias for ``--backend-store-
.. important::

``mlflow server`` will fail against a database-backed store with an out-of-date database schema.
To prevent this, upgrade your database schema to the latest supported version via
``mlflow db upgrade [db_uri]``. Note that schema migrations can result in database downtime, may
take longer on larger databases, and are not guaranteed to be transactional. As such, always
To prevent this, upgrade your database schema to the latest supported version using
``mlflow db upgrade [db_uri]``. Schema migrations can result in database downtime, may
take longer on larger databases, and are not guaranteed to be transactional. You should always
take a backup of your database prior to running ``mlflow db upgrade`` - consult your database's
documentation for instructions on taking a backup.

Expand Down Expand Up @@ -373,8 +404,12 @@ See `Set up AWS Credentials and Region for Development <https://docs.aws.amazon.
is a path inside the file store. Typically this is not an appropriate location, as the client and
server probably refer to different physical locations (that is, the same path on different disks).

Supported Artifact Stores
~~~~~~~~~~~~~~~~~~~~~~~~~
Artifact Stores
~~~~~~~~~~~~~~~~

.. contents:: In this section:
:local:
:depth: 1

In addition to local file paths, MLflow supports the following storage systems as artifact
stores: Amazon S3, Azure Blob Storage, Google Cloud Storage, SFTP server, and NFS.
Expand Down Expand Up @@ -419,7 +454,7 @@ to access Google Cloud Storage; MLflow does not declare a dependency on this pac
FTP server
^^^^^^^^^^^

Specify a URI of the form ftp://user@host/path/to/directory to store artifacts in a FTP server.
To store artifacts in a FTP server, specify a URI of the form ftp://user@host/path/to/directory .
The URI may optionally include a password for logging into the server, e.g. ``ftp://user:pass@host/path/to/directory``

SFTP Server
Expand Down Expand Up @@ -470,6 +505,7 @@ Optionally one can select a different version of the HDFS driver library using:
The default one is ```libhdfs```.


Networking
----------

Expand Down
31 changes: 16 additions & 15 deletions mlflow/azureml/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,17 +38,17 @@ def build_image(model_uri, workspace, image_name=None, model_name=None,
For information about the input data formats accepted by this webserver, see the
:ref:`MLflow deployment tools documentation <azureml_deployment>`.
:param model_uri: The location, in URI format, of the MLflow model for which to build an Azure
ML deployment image, for example:
:param model_uri: The location, in URI format, of the MLflow model used to build the Azure
ML deployment image. For example:
- ``/Users/me/path/to/local/model``
- ``relative/path/to/local/model``
- ``s3://my_bucket/path/to/model``
- ``runs:/<mlflow_run_id>/run-relative/path/to/model``
For more information about supported URI schemes, see the
`Artifacts Documentation <https://www.mlflow.org/docs/latest/tracking.html#
supported-artifact-stores>`_.
For more information about supported URI schemes, see
`Referencing Artifacts <https://www.mlflow.org/docs/latest/tracking.html#
artifact-locations>`_.
:param image_name: The name to assign the Azure Container Image that will be created. If
unspecified, a unique image name will be generated.
Expand All @@ -67,20 +67,21 @@ def build_image(model_uri, workspace, image_name=None, model_name=None,
azureml.core.model.model?view=azure-ml-py#register>`_.
:param tags: A collection of tags, represented as a dictionary of string key-value pairs, to
associate with the Azure Container Image and the Azure Model that will be created.
These tags will be added to a set of default tags that include the model path,
the model run id (if specified), and more. For more information, see
These tags are added to a set of default tags that include the model uri,
and more. For more information, see
`<https://docs.microsoft.com/en-us/python/api/azureml-core/
azureml.core.image.container.containerimageconfig>`_ and
`<https://docs.microsoft.com/en-us/python/api/azureml-core/
azureml.core.model.model?view=azure-ml-py#register>`_.
:param synchronous: If `True`, this method will block until the image creation procedure
terminates before returning. If `False`, the method will return immediately,
:param synchronous: If ``True``, this method blocks until the image creation procedure
terminates before returning. If ``False``, the method returns immediately,
but the returned image will not be available until the asynchronous
creation process completes. The `azureml.core.Image.wait_for_creation()`
function can be used to wait for the creation process to complete.
creation process completes. Use the
``azureml.core.Image.wait_for_creation()`` function to wait for the creation
process to complete.
:return: A tuple containing the following elements in order:
- An `azureml.core.image.ContainerImage` object containing metadata for the new image.
- An `azureml.core.model.Model` object containing metadata for the new model.
- An ``azureml.core.image.ContainerImage`` object containing metadata for the new image.
- An ``azureml.core.model.Model`` object containing metadata for the new model.
>>> import mlflow.azureml
>>> from azureml.core import Workspace
Expand All @@ -100,7 +101,7 @@ def build_image(model_uri, workspace, image_name=None, model_name=None,
>>>
>>> # Build an Azure ML Container Image for an MLflow model
>>> azure_image, azure_model = mlflow.azureml.build_image(
>>> model_path="<model_path>",
>>> model_uri="<model_uri>",
>>> workspace=azure_workspace,
>>> synchronous=True)
>>> # If your image build failed, you can access build logs at the following URI:
Expand All @@ -126,7 +127,7 @@ def build_image(model_uri, workspace, image_name=None, model_name=None,
if model_python_version is not None and\
StrictVersion(model_python_version) < StrictVersion("3.0.0"):
raise MlflowException(
message=("Azure ML can only deploy models trained in Python 3 or above! Please see"
message=("Azure ML can only deploy models trained in Python 3 and above. See"
" the following MLflow GitHub issue for a thorough explanation of this"
" limitation and a workaround to enable support for deploying models"
" trained in Python 2: https://github.com/mlflow/mlflow/issues/668"),
Expand Down
2 changes: 1 addition & 1 deletion mlflow/azureml/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def commands():
@click.option("--tags", "-t", default=None,
help=("A collection of tags, represented as a JSON-formatted dictionary of string"
" key-value pairs, to associate with the Azure Container Image and the Azure"
" Model that are created. These tags will be added to a set of default tags"
" Model that are created. These tags are added to a set of default tags"
" that include the model path, the model run id (if specified), and more."))
@experimental
def build_image(model_uri, workspace_name, subscription_id, image_name, model_name,
Expand Down
12 changes: 6 additions & 6 deletions mlflow/h2o.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ def save_model(h2o_model, path, conda_env=None, mlflow_model=Model(), settings=N
:param conda_env: Either a dictionary representation of a Conda environment or the path to a
Conda environment yaml file. If provided, this decribes the environment
this model should be run in. At minimum, it should specify the dependencies
contained in :func:`get_default_conda_env()`. If `None`, the default
contained in :func:`get_default_conda_env()`. If ``None``, the default
:func:`get_default_conda_env()` environment is added to the model.
The following is an *example* dictionary representation of a Conda
environment::
Expand Down Expand Up @@ -111,7 +111,7 @@ def log_model(h2o_model, artifact_path, conda_env=None, **kwargs):
:param conda_env: Either a dictionary representation of a Conda environment or the path to a
Conda environment yaml file. If provided, this decribes the environment
this model should be run in. At minimum, it should specify the dependencies
contained in :func:`get_default_conda_env()`. If `None`, the default
contained in :func:`get_default_conda_env()`. If ``None``, the default
:func:`get_default_conda_env()` environment is added to the model.
The following is an *example* dictionary representation of a Conda
environment::
Expand Down Expand Up @@ -171,16 +171,16 @@ def load_model(model_uri):
Load an H2O model from a local file (if ``run_id`` is ``None``) or a run.
This function expects there is an H2O instance initialised with ``h2o.init``.
:param model_uri: The location, in URI format, of the MLflow model, for example:
:param model_uri: The location, in URI format, of the MLflow model. For example:
- ``/Users/me/path/to/local/model``
- ``relative/path/to/local/model``
- ``s3://my_bucket/path/to/model``
- ``runs:/<mlflow_run_id>/run-relative/path/to/model``
For more information about supported URI schemes, see the
`Artifacts Documentation <https://www.mlflow.org/docs/latest/tracking.html#
supported-artifact-stores>`_.
For more information about supported URI schemes, see
`Referencing Artifacts <https://www.mlflow.org/docs/latest/tracking.html#
artifact-locations>`_.
:return: An `H2OEstimator model object
<http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/intro.html#models>`_.
Expand Down
40 changes: 21 additions & 19 deletions mlflow/keras.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,11 @@ def save_model(keras_model, path, conda_env=None, mlflow_model=Model()):
:param path: Local path where the model is to be saved.
:param conda_env: Either a dictionary representation of a Conda environment or the path to a
Conda environment yaml file. If provided, this decribes the environment
this model should be run in. At minimum, it should specify the dependencies
contained in :func:`get_default_conda_env()`. If `None`, the default
:func:`get_default_conda_env()` environment is added to the model.
The following is an *example* dictionary representation of a Conda
environment::
this model should be run in. At minimum, it should specify the
dependencies contained in :func:`get_default_conda_env()`. If
``None``, the default :func:`get_default_conda_env()` environment is
added to the model. The following is an *example* dictionary
representation of a Conda environment::
{
'name': 'mlflow-env',
Expand Down Expand Up @@ -111,13 +111,14 @@ def log_model(keras_model, artifact_path, conda_env=None, **kwargs):
:param keras_model: Keras model to be saved.
:param artifact_path: Run-relative artifact path.
:param conda_env: Either a dictionary representation of a Conda environment or the path to a
Conda environment yaml file. If provided, this decribes the environment
this model should be run in. At minimum, it should specify the dependencies
contained in :func:`get_default_conda_env()`. If `None`, the default
:func:`mlflow.keras.get_default_conda_env()` environment is added to the
model. The following is an *example* dictionary representation of a Conda
environment::
:param conda_env: Either a dictionary representation of a Conda environment or
the path to a Conda environment yaml file.
If provided, this decribes the environment this model should be
run in. At minimum, it should specify the dependencies
contained in :func:`get_default_conda_env()`. If ``None``, the default
:func:`mlflow.keras.get_default_conda_env()` environment is added to
the model. The following is an *example* dictionary representation of a
Conda environment::
{
'name': 'mlflow-env',
Expand Down Expand Up @@ -203,24 +204,25 @@ def _load_pyfunc(path):

def load_model(model_uri, **kwargs):
"""
Load a Keras model from a local file (if ``run_id`` is None) or a run.
Load a Keras model from a local file or a run.
Extra arguments are passed through to keras.load_model.
:param model_uri: The location, in URI format, of the MLflow model, for example:
:param model_uri: The location, in URI format, of the MLflow model. For example:
- ``/Users/me/path/to/local/model``
- ``relative/path/to/local/model``
- ``s3://my_bucket/path/to/model``
- ``runs:/<mlflow_run_id>/run-relative/path/to/model``
For more information about supported URI schemes, see the
`Artifacts Documentation <https://www.mlflow.org/docs/latest/tracking.html#
supported-artifact-stores>`_.
For more information about supported URI schemes, see
`Referencing Artifacts <https://www.mlflow.org/docs/latest/tracking.html#
artifact-locations>`_.
:return: A Keras model instance.
>>> # Load persisted model as a Keras model or as a PyFunc, call predict() on a Pandas DataFrame
>>> keras_model = mlflow.keras.load_model("models", run_id="96771d893a5e46159d9f3b49bf9013e2")
>>> # Load persisted model as a Keras model or as a PyFunc, call predict() on a pandas DataFrame
>>> keras_model = mlflow.keras.load_model("runs:/96771d893a5e46159d9f3b49bf9013e2" + "/models")
>>> predictions = keras_model.predict(x_test)
"""
local_model_path = _download_artifact_from_uri(artifact_uri=model_uri)
Expand Down
6 changes: 3 additions & 3 deletions mlflow/mleap.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,10 +169,10 @@ def add_to_model(mlflow_model, path, spark_model, sample_input):

def _get_mleap_schema(dataframe):
"""
:param dataframe: A PySpark dataframe object
:param dataframe: A PySpark DataFrame object
:return: The schema of the supplied dataframe, in MLeap format. This serialized object of type
`ml.combust.mleap.core.types.StructType`, represented as a JSON dictionary.
``ml.combust.mleap.core.types.StructType``, represented as a JSON dictionary.
"""
from pyspark.ml.util import _jvm
ReflectionUtil = _jvm().py4j.reflection.ReflectionUtil
Expand Down Expand Up @@ -204,5 +204,5 @@ def _handle_py4j_error(reraised_error_type, reraised_error_text):


class MLeapSerializationException(MlflowException):
"""Exception thrown when a model or dataframe cannot be serialized in MLeap format"""
"""Exception thrown when a model or DataFrame cannot be serialized in MLeap format"""
pass
Loading

0 comments on commit c09761e

Please sign in to comment.