Skip to content

Commit

Permalink
Add page about data location options
Browse files Browse the repository at this point in the history
  • Loading branch information
keller-mark committed Jul 5, 2023
1 parent 6d27295 commit c70b557
Show file tree
Hide file tree
Showing 2 changed files with 63 additions and 48 deletions.
110 changes: 62 additions & 48 deletions docs/data_options.rst
Original file line number Diff line number Diff line change
@@ -1,96 +1,110 @@
Data location options
###############
#####################

There are several possibilities for how to configure data for the Vitessce Jupyter widget depending on:

* where the Python process powering Jupyter is running
* where the data is located relative to the Python process for Jupyter
* where the web browser accessing Jupyter is located relative to the Python process for Jupyter
* where the Python process powering Jupyter is running (i.e., on which machine, relative to the web browser in which Jupyter is being accessed)
* where the data is relative to the Python process for Jupyter
* where the web browser accessing Jupyter is relative to the Python process for Jupyter

=================
The following sections describe the various possibilities and the implications that each has on Vitessce's automatic data export functionality.
The goal of automatic data export is: given an input Vitessce configuration (i.e., ``VitessceConfig`` instance), to generate a directory of data files that, when served via HTTP, can be accessed by Vitessce in a web browser using the corresonding JSON-serialized configuration (i.e., ``VitessceConfig.to_dict()``).

====================================
Jupyter process: local; Files: local
=================
====================================

In this case, you ran `jupyter lab` in a terminal local to your computer (i.e., not a cluster or remote machine) and the data files you want to visualize are located on the same machine.
In this case, you ran ``jupyter lab`` in a terminal local to your computer (i.e., not a cluster or remote machine) and the data files you want to visualize are located on the same machine.

-----------------
Configuration via relative file paths
-----------------
-------------------------------------------------------
Configuration via file paths relative to a ``base_dir``
-------------------------------------------------------

Export requires copying
For more information, see the VitessceConfig `constructor <api_config.html#vitessce.config.VitessceConfig>`_ or the `Configure relative to a base_dir <notebooks/widget_brain_with_base_dir.html>`_ example notebook.

-----------------
Configuration via absolute file paths
-----------------
**Note**: Export (and thereby copying of files) is not required, as ``base_dir`` is already equivalent to an exported data directory, which the file URLs in the configuration will be relative to.

-------------------------------------
Configuration via relative file paths
-------------------------------------

Export requires copying
**Note**: Export requires copying, as we need to generate a single data directory for all of the files, which the file URLs in the configuration will be relative to.

-----------------
Configuration via file paths relative to a `base_dir`
-----------------
-------------------------------------
Configuration via absolute file paths
-------------------------------------

Export not required; `base_dir` is equivalent to an exported data directory
**Note**: Export requires copying, as we need to generate a single data directory for all of the files, which the file URLs in the configuration will be relative to.


=================
Jupyter process: anywhere; Files: remote & already being served via HTTP
=================
==========================================================
Jupyter process: anywhere; Files: remote & served via HTTP
==========================================================

In this case, Jupyter is running anywhere (e.g., local, or remote machine, or service like Colab/Binder).
Files are being served via HTTP on internet-accessible URLs (i.e., non-localhost).
In this case, Jupyter may be running anywhere (e.g., local, or remote machine, or service like Colab/Binder).
Files are already being served via HTTP on internet-accessible URLs (i.e., non-localhost).

For more information about how to host data for Vitessce on remote servers, please see http://vitessce.io/docs/data-hosting/.


Export not required; http urls already absolute

**Note**: Exporting the data is not required (nor possible) as the files are already referenced via HTTP URLs in the configuration.

-----------------
--------------
Range requests
-----------------
--------------

Certain file formats (e.g., OME-TIFF) are loaded by Vitessce via HTTP range requests, so ensure your hosting service supports these when using such file formats.

=================
========================================================
Jupyter process: local; Files: remote & accessed via SSH
=================
========================================================

In this case, you can serve the files using a local HTTP server such as `http-server` on a particular port, and then use port forwarding when initializing the SSH session.
In this case, you can serve the files using a local HTTP server such as `http-server <https://github.com/http-party/http-server>`_ on a particular port, and then use port forwarding when initializing the SSH session.

If running the HTTP server within an interactive compute session (e.g., with SLURM), you may also need to configure port forwarding when initializing the compute session.

TODO: copy code snippet from https://github.com/keller-mark/snippets#serve-directory-of-files
TODO: link to docs to install `http-server`
.. code-block:: bash
# Replace REMOTE_PORT and LOCAL_PORT below with the desired port numbers.
# Replace my_username and cluster.university.edu.
ssh -L REMOTE_PORT:127.0.0.1:LOCAL_PORT my_username@cluster.university.edu
# Optionally SSH to a particular login node first.
ssh -L REMOTE_PORT:127.0.0.1:REMOTE_PORT login01
# Optionally start an interactive compute session first, but make sure tunneling/port forwarding is enabled.
# Note: this command may be different depending on the cluster and job management system.
srun -t 0-3:00 --pty -p interactive --tunnel REMOTE_PORT:REMOTE_PORT /bin/bash
# cd to some directory with files to serve.
http-server --cors='*' --port REMOTE_PORT .
Then, you can configure Vitessce using localhost HTTP urls as if the files are being served locally.
Note: Automatic export not possible.
Then, you can configure Vitessce using localhost HTTP urls as if the files are being served locally (i.e., on ``http://localhost:LOCAL_PORT/``).

=================
**Note**: Exporting the data is not possible as the files are already referenced via HTTP URLs in the configuration.

=====================================================================================
Jupyter process: remote & accessed via SSH; Files: on same machine as Jupyter process
=================
=====================================================================================

This case can be treated almost the same as when the Jupyter process is local and the files are local (the first case).

However, when accessing the notebook from your local web browser, you may need to use the `proxy` parameter:
However, when accessing the notebook from your local web browser, you may need to use the ``proxy`` parameter:

.. code-block:: python
```python
vc.widget(proxy=True)
```
vc.widget(proxy=True)

=================
===============================================================
Jupyter process: remote service like Colab/Binder; Files: local
=================
===============================================================

Unfortunately, this will not work.
Unfortunately, this will not work because the remote server cannot access your local files.

=================
===================================================================================
Jupyter process: remote service like Colab/Binder; Files: remote & accessed via SSH
=================
===================================================================================

Unfortunately, this will not work.
Unfortunately, this will not work because the remote server cannot access the files that are on another machine behind SSH.



1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ The Vitessce widget is compatible with the following interactive Python platform
data_examples
api_config
api_data
data_options
screenshots


Expand Down

0 comments on commit c70b557

Please sign in to comment.