[ML-7027] Document artifact store locations and update docstrings. (m…

…lflow#1253) Update documentation for new runs:/ scheme.
simonvanbernem · Jun 3, 2019 · c09761e · c09761e
1 parent b0c66b5
commit c09761e
Show file tree

Hide file tree

Showing 16 changed files with 210 additions and 161 deletions.
diff --git a/docs/source/python_api/index.rst b/docs/source/python_api/index.rst
@@ -3,14 +3,13 @@
 Python API
 ==========
 
-The MLflow Python API is organized into the following modules. The most common functions are also
+The MLflow Python API is organized into the following modules. The most common functions are
 exposed in the :py:mod:`mlflow` module, so we recommend starting there.
 
 .. toctree::
   :glob:
-  :maxdepth: 1
 
   *
 
 
-See also an :ref:`index of all functions and classes<genindex>`.
+See also the :ref:`index of all functions and classes<genindex>`.
diff --git a/docs/source/tracking.rst b/docs/source/tracking.rst
@@ -6,8 +6,7 @@ MLflow Tracking
 
 The MLflow Tracking component is an API and UI for logging parameters, code versions, metrics, and output files
 when running your machine learning code and for later visualizing the results.
-MLflow Tracking lets you log and query experiments using :ref:`Python <python-api>`, :ref:`REST <rest-api>`, :ref:`R-api`, 
-and :ref:`java_api` APIs.
+MLflow Tracking lets you log and query experiments using :ref:`Python <python-api>`, :ref:`REST <rest-api>`, :ref:`R-api`, and :ref:`java_api` APIs.
 
 .. contents:: Table of Contents
   :local:
@@ -61,8 +60,7 @@ Where Runs Are Recorded
 =======================
 
 MLflow runs can be recorded to local files, to a SQLAlchemy compatible database, or remotely
-to a tracking server.
-By default, the MLflow Python API logs runs locally to files in an ``mlruns`` directory wherever you
+to a tracking server. By default, the MLflow Python API logs runs locally to files in an ``mlruns`` directory wherever you
 ran your program. You can then run ``mlflow ui`` to see the logged runs. 
 
 To log runs remotely, set the ``MLFLOW_TRACKING_URI`` environment variable to a tracking server's URI or 
@@ -287,13 +285,46 @@ The UI contains the following key features:
 Querying Runs Programmatically
 ==============================
 
-All of the functions in the Tracking UI can be accessed programmatically. This makes it easy to do several common tasks:
+You can access all of the functions in the Tracking UI programmatically. This makes it easy to do several common tasks:
 
 * Query and compare runs using any data analysis tool of your choice, for example, **pandas**. 
 * Determine the artifact URI for a run to feed some of its artifacts into a new run when executing a workflow. For an example of querying runs and constructing a multistep workflow, see the MLflow `Multistep Workflow Example project <https://github.com/mlflow/mlflow/blob/15cc05ce2217b7c7af4133977b07542934a9a19f/examples/multistep_workflow/main.py#L63>`_.
 * Load artifacts from past runs as :ref:`models`. For an example of training, exporting, and loading a model, and predicting using the model, see the MLFlow `TensorFlow example <https://github.com/mlflow/mlflow/tree/master/examples/tensorflow>`_.
 * Run automated parameter search algorithms, where you query the metrics from various runs to submit new ones. For an example of running automated parameter search algorithms, see the MLflow `Hyperparameter Tuning Example project <https://github.com/mlflow/mlflow/blob/master/examples/hyperparam/README.rst>`_.
 
+.. _artifact-locations:
+
+Referencing Artifacts
+---------------------
+
+When you specify the location of an artifact in MLflow APIs, the syntax depends on whether you
+are invoking the Tracking, Models, or Projects API. For the Tracking API, you specify the artifact location using a (run ID, relative path) tuple. For the Models and Projects APIs, you specify the artifact location in the follow ways:
+
+- ``/Users/me/path/to/local/model``
+- ``relative/path/to/local/model``
+- ``<scheme>/<scheme-dependent-path>``. For example:
+
+  - ``s3://my_bucket/path/to/model``
+  - ``hdfs://<host>:<port>/<path>``
+  - ``runs:/<mlflow_run_id>/run-relative/path/to/model``
+
+For example:
+
+.. rubric:: Tracking API
+
+.. code-block:: py
+
+  mlflow.tracking.log_artifacts("<mlflow_run_id>", "/path/to/artifact")
+  
+.. rubric:: Models API
+
+.. code-block:: py
+
+  mlflow.pytorch.load_model("runs:/<mlflow_run_id>/run-relative/path/to/model")
+
+
+
+
 
 .. _tracking_server:
 
@@ -336,9 +367,9 @@ For backwards compatibility, ``--file-store`` is an alias for ``--backend-store-
 .. important::
 
     ``mlflow server`` will fail against a database-backed store with an out-of-date database schema.
-    To prevent this, upgrade your database schema to the latest supported version via
-    ``mlflow db upgrade [db_uri]``. Note that schema migrations can result in database downtime, may
-    take longer on larger databases, and are not guaranteed to be transactional. As such, always
+    To prevent this, upgrade your database schema to the latest supported version using
+    ``mlflow db upgrade [db_uri]``. Schema migrations can result in database downtime, may
+    take longer on larger databases, and are not guaranteed to be transactional. You should always
     take a backup of your database prior to running ``mlflow db upgrade`` - consult your database's
     documentation for instructions on taking a backup.
 
@@ -373,8 +404,12 @@ See `Set up AWS Credentials and Region for Development <https://docs.aws.amazon.
   is a path inside the file store. Typically this is not an appropriate location, as the client and
   server probably refer to different physical locations (that is, the same path on different disks).
 
-Supported Artifact Stores
-~~~~~~~~~~~~~~~~~~~~~~~~~
+Artifact Stores
+~~~~~~~~~~~~~~~~
+
+.. contents:: In this section:
+  :local:
+  :depth: 1
 
 In addition to local file paths, MLflow supports the following storage systems as artifact
 stores: Amazon S3, Azure Blob Storage, Google Cloud Storage, SFTP server, and NFS.
@@ -419,7 +454,7 @@ to access Google Cloud Storage; MLflow does not declare a dependency on this pac
 FTP server
 ^^^^^^^^^^^
 
-Specify a URI of the form ftp://user@host/path/to/directory to store artifacts in a FTP server. 
+To store artifacts in a FTP server, specify a URI of the form ftp://user@host/path/to/directory . 
 The URI may optionally include a password for logging into the server, e.g. ``ftp://user:pass@host/path/to/directory``
 
 SFTP Server
@@ -470,6 +505,7 @@ Optionally one can select a different version of the HDFS driver library using:
 
 The default one is ```libhdfs```.
 
+
 Networking
 ----------
 

diff --git a/mlflow/azureml/__init__.py b/mlflow/azureml/__init__.py
@@ -38,17 +38,17 @@ def build_image(model_uri, workspace, image_name=None, model_name=None,
     For information about the input data formats accepted by this webserver, see the
     :ref:`MLflow deployment tools documentation <azureml_deployment>`.
 
-    :param model_uri: The location, in URI format, of the MLflow model for which to build an Azure
-                      ML deployment image, for example:
+    :param model_uri: The location, in URI format, of the MLflow model used to build the Azure
+                      ML deployment image. For example:
 
                       - ``/Users/me/path/to/local/model``
                       - ``relative/path/to/local/model``
                       - ``s3://my_bucket/path/to/model``
                       - ``runs:/<mlflow_run_id>/run-relative/path/to/model``
 
-                      For more information about supported URI schemes, see the
-                      `Artifacts Documentation <https://www.mlflow.org/docs/latest/tracking.html#
-                      supported-artifact-stores>`_.
+                      For more information about supported URI schemes, see
+                      `Referencing Artifacts <https://www.mlflow.org/docs/latest/tracking.html#
+                      artifact-locations>`_.
 
     :param image_name: The name to assign the Azure Container Image that will be created. If
                        unspecified, a unique image name will be generated.
@@ -67,20 +67,21 @@ def build_image(model_uri, workspace, image_name=None, model_name=None,
                         azureml.core.model.model?view=azure-ml-py#register>`_.
     :param tags: A collection of tags, represented as a dictionary of string key-value pairs, to
                  associate with the Azure Container Image and the Azure Model that will be created.
-                 These tags will be added to a set of default tags that include the model path,
-                 the model run id (if specified), and more. For more information, see
+                 These tags are added to a set of default tags that include the model uri,
+                 and more. For more information, see
                  `<https://docs.microsoft.com/en-us/python/api/azureml-core/
                  azureml.core.image.container.containerimageconfig>`_ and
                  `<https://docs.microsoft.com/en-us/python/api/azureml-core/
                  azureml.core.model.model?view=azure-ml-py#register>`_.
-    :param synchronous: If `True`, this method will block until the image creation procedure
-                        terminates before returning. If `False`, the method will return immediately,
+    :param synchronous: If ``True``, this method blocks until the image creation procedure
+                        terminates before returning. If ``False``, the method returns immediately,
                         but the returned image will not be available until the asynchronous
-                        creation process completes. The `azureml.core.Image.wait_for_creation()`
-                        function can be used to wait for the creation process to complete.
+                        creation process completes. Use the
+                        ``azureml.core.Image.wait_for_creation()`` function to wait for the creation
+                        process to complete.
     :return: A tuple containing the following elements in order:
-             - An `azureml.core.image.ContainerImage` object containing metadata for the new image.
-             - An `azureml.core.model.Model` object containing metadata for the new model.
+            - An ``azureml.core.image.ContainerImage`` object containing metadata for the new image.
+            - An ``azureml.core.model.Model`` object containing metadata for the new model.
 
     >>> import mlflow.azureml
     >>> from azureml.core import Workspace
@@ -100,7 +101,7 @@ def build_image(model_uri, workspace, image_name=None, model_name=None,
     >>>
     >>> # Build an Azure ML Container Image for an MLflow model
     >>> azure_image, azure_model = mlflow.azureml.build_image(
-    >>>                                 model_path="<model_path>",
+    >>>                                 model_uri="<model_uri>",
     >>>                                 workspace=azure_workspace,
     >>>                                 synchronous=True)
     >>> # If your image build failed, you can access build logs at the following URI:
@@ -126,7 +127,7 @@ def build_image(model_uri, workspace, image_name=None, model_name=None,
     if model_python_version is not None and\
             StrictVersion(model_python_version) < StrictVersion("3.0.0"):
         raise MlflowException(
-                message=("Azure ML can only deploy models trained in Python 3 or above! Please see"
+                message=("Azure ML can only deploy models trained in Python 3 and above. See"
                          " the following MLflow GitHub issue for a thorough explanation of this"
                          " limitation and a workaround to enable support for deploying models"
                          " trained in Python 2: https://github.com/mlflow/mlflow/issues/668"),

diff --git a/mlflow/azureml/cli.py b/mlflow/azureml/cli.py
@@ -42,7 +42,7 @@ def commands():
 @click.option("--tags", "-t", default=None,
               help=("A collection of tags, represented as a JSON-formatted dictionary of string"
                     " key-value pairs, to associate with the Azure Container Image and the Azure"
-                    " Model that are created. These tags will be added to a set of default tags"
+                    " Model that are created. These tags are added to a set of default tags"
                     " that include the model path, the model run id (if specified), and more."))
 @experimental
 def build_image(model_uri, workspace_name, subscription_id, image_name, model_name,

diff --git a/mlflow/h2o.py b/mlflow/h2o.py
@@ -47,7 +47,7 @@ def save_model(h2o_model, path, conda_env=None, mlflow_model=Model(), settings=N
     :param conda_env: Either a dictionary representation of a Conda environment or the path to a
                       Conda environment yaml file. If provided, this decribes the environment
                       this model should be run in. At minimum, it should specify the dependencies
-                      contained in :func:`get_default_conda_env()`. If `None`, the default
+                      contained in :func:`get_default_conda_env()`. If ``None``, the default
                       :func:`get_default_conda_env()` environment is added to the model.
                       The following is an *example* dictionary representation of a Conda
                       environment::
@@ -111,7 +111,7 @@ def log_model(h2o_model, artifact_path, conda_env=None, **kwargs):
     :param conda_env: Either a dictionary representation of a Conda environment or the path to a
                       Conda environment yaml file. If provided, this decribes the environment
                       this model should be run in. At minimum, it should specify the dependencies
-                      contained in :func:`get_default_conda_env()`. If `None`, the default
+                      contained in :func:`get_default_conda_env()`. If ``None``, the default
                       :func:`get_default_conda_env()` environment is added to the model.
                       The following is an *example* dictionary representation of a Conda
                       environment::
@@ -171,16 +171,16 @@ def load_model(model_uri):
     Load an H2O model from a local file (if ``run_id`` is ``None``) or a run.
     This function expects there is an H2O instance initialised with ``h2o.init``.
 
-    :param model_uri: The location, in URI format, of the MLflow model, for example:
+    :param model_uri: The location, in URI format, of the MLflow model. For example:
 
                       - ``/Users/me/path/to/local/model``
                       - ``relative/path/to/local/model``
                       - ``s3://my_bucket/path/to/model``
                       - ``runs:/<mlflow_run_id>/run-relative/path/to/model``
 
-                      For more information about supported URI schemes, see the
-                      `Artifacts Documentation <https://www.mlflow.org/docs/latest/tracking.html#
-                      supported-artifact-stores>`_.
+                      For more information about supported URI schemes, see
+                      `Referencing Artifacts <https://www.mlflow.org/docs/latest/tracking.html#
+                      artifact-locations>`_.
 
     :return: An `H2OEstimator model object
              <http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/intro.html#models>`_.

diff --git a/mlflow/keras.py b/mlflow/keras.py
@@ -53,11 +53,11 @@ def save_model(keras_model, path, conda_env=None, mlflow_model=Model()):
     :param path: Local path where the model is to be saved.
     :param conda_env: Either a dictionary representation of a Conda environment or the path to a
                       Conda environment yaml file. If provided, this decribes the environment
-                      this model should be run in. At minimum, it should specify the dependencies
-                      contained in :func:`get_default_conda_env()`. If `None`, the default
-                      :func:`get_default_conda_env()` environment is added to the model.
-                      The following is an *example* dictionary representation of a Conda
-                      environment::
+                      this model should be run in. At minimum, it should specify the
+                      dependencies contained in :func:`get_default_conda_env()`. If
+                      ``None``, the default :func:`get_default_conda_env()` environment is
+                      added to the model. The following is an *example* dictionary
+                      representation of a Conda environment::
 
                         {
                             'name': 'mlflow-env',
@@ -111,13 +111,14 @@ def log_model(keras_model, artifact_path, conda_env=None, **kwargs):
 
     :param keras_model: Keras model to be saved.
     :param artifact_path: Run-relative artifact path.
-    :param conda_env: Either a dictionary representation of a Conda environment or the path to a
-                      Conda environment yaml file. If provided, this decribes the environment
-                      this model should be run in. At minimum, it should specify the dependencies
-                      contained in :func:`get_default_conda_env()`. If `None`, the default
-                      :func:`mlflow.keras.get_default_conda_env()` environment is added to the
-                      model. The following is an *example* dictionary representation of a Conda
-                      environment::
+    :param conda_env: Either a dictionary representation of a Conda environment or
+                      the path to a Conda environment yaml file.
+                      If provided, this decribes the environment this model should be
+                      run in. At minimum, it should specify the dependencies
+                      contained in :func:`get_default_conda_env()`. If ``None``, the default
+                      :func:`mlflow.keras.get_default_conda_env()` environment is added to
+                      the model. The following is an *example* dictionary representation of a
+                      Conda environment::
 
                         {
                             'name': 'mlflow-env',
@@ -203,24 +204,25 @@ def _load_pyfunc(path):
 
 def load_model(model_uri, **kwargs):
     """
-    Load a Keras model from a local file (if ``run_id`` is None) or a run.
+    Load a Keras model from a local file or a run.
+
     Extra arguments are passed through to keras.load_model.
 
-    :param model_uri: The location, in URI format, of the MLflow model, for example:
+    :param model_uri: The location, in URI format, of the MLflow model. For example:
 
                       - ``/Users/me/path/to/local/model``
                       - ``relative/path/to/local/model``
                       - ``s3://my_bucket/path/to/model``
                       - ``runs:/<mlflow_run_id>/run-relative/path/to/model``
 
-                      For more information about supported URI schemes, see the
-                      `Artifacts Documentation <https://www.mlflow.org/docs/latest/tracking.html#
-                      supported-artifact-stores>`_.
+                      For more information about supported URI schemes, see
+                      `Referencing Artifacts <https://www.mlflow.org/docs/latest/tracking.html#
+                      artifact-locations>`_.
 
     :return: A Keras model instance.
 
-    >>> # Load persisted model as a Keras model or as a PyFunc, call predict() on a Pandas DataFrame
-    >>> keras_model = mlflow.keras.load_model("models", run_id="96771d893a5e46159d9f3b49bf9013e2")
+    >>> # Load persisted model as a Keras model or as a PyFunc, call predict() on a pandas DataFrame
+    >>> keras_model = mlflow.keras.load_model("runs:/96771d893a5e46159d9f3b49bf9013e2" + "/models")
     >>> predictions = keras_model.predict(x_test)
     """
     local_model_path = _download_artifact_from_uri(artifact_uri=model_uri)

diff --git a/mlflow/mleap.py b/mlflow/mleap.py
@@ -169,10 +169,10 @@ def add_to_model(mlflow_model, path, spark_model, sample_input):
 
 def _get_mleap_schema(dataframe):
     """
-    :param dataframe: A PySpark dataframe object
+    :param dataframe: A PySpark DataFrame object
 
     :return: The schema of the supplied dataframe, in MLeap format. This serialized object of type
-    `ml.combust.mleap.core.types.StructType`, represented as a JSON dictionary.
+    ``ml.combust.mleap.core.types.StructType``, represented as a JSON dictionary.
     """
     from pyspark.ml.util import _jvm
     ReflectionUtil = _jvm().py4j.reflection.ReflectionUtil
@@ -204,5 +204,5 @@ def _handle_py4j_error(reraised_error_type, reraised_error_text):
 
 
 class MLeapSerializationException(MlflowException):
-    """Exception thrown when a model or dataframe cannot be serialized in MLeap format"""
+    """Exception thrown when a model or DataFrame cannot be serialized in MLeap format"""
     pass