Merge pull request #149 from chrishavlin/load_sample_data

Adding sample data
data-exp-lab · Oct 23, 2024 · c132a66 · c132a66
2 parents cacb559 + 1bb93d0
commit c132a66
Show file tree

Hide file tree

Showing 28 changed files with 801 additions and 43 deletions.
diff --git a/HISTORY.md b/HISTORY.md
@@ -1,3 +1,8 @@
+## v0.5.0dev
+
+### New Features
+* sample data now available!
+
 ## v0.5.0
 
 ### New Features

diff --git a/README.md b/README.md
@@ -152,13 +152,21 @@ Contributions are very welcome! Development follows a fork and pull request work
 
 ### development environment
 
-To start developing, fork the repository and clone your fork to get a local copy. You can then install in development mode with
+To start developing, fork the repository and clone your fork to get a local copy. You can then install in development mode along with
+all the extra requirements for developing:
 
-    pip install -e .
+    pip install -e .[full,dev]
 
 ### tests and style checks
 
-Both bug fixes and new features will need to pass the existing test suite and style checks. While both will be run automatically when you submit a pull request, it is helpful to run the test suites locally and run style checks throughout development. For testing, you can use [tox] to test different python versions on your platform.
+Both bug fixes and new features will need to pass the existing test suite and style checks. While both will be run
+automatically when you submit a pull request, it is helpful to run the test suites locally and run style checks
+throughout development. For testing, you can use [tox] to test different python versions on your platform or
+simply run `pytest` and rely on the github actions to test the additional python environments.
+
+#### testing with tox
+
+first install `tox` with:
 
     pip install tox
 
@@ -168,16 +176,27 @@ And then from the top level of the `yt-napari` directory, run
 
 Tox will then run a series of tests in isolated environments. In addition to checking the terminal output for test results, the tox run will generate a test coverage report: a `coverage.xml` file and a `htmlcov` folder -- to view the results, open `htmlcov/index.html` in a browser.
 
+#### testing with pytest
+
 If you prefer a lighter weight test, you can also use `pytest` directly and rely on the Github CI to test different python versions and systems. To do so, first install `pytest` and some related plugins:
 
     pip install pytest pytest-qt pytest-cov
 
-Now, to run the tests:
+Note that if you set up your dev environment with `pip install -e .[full,dev]` as suggested above, you'll arelady
+have these dependencies.
+
+To run the tests you can use the `pytest` command
 
     pytest -v --cov=yt_napari --cov-report=html
 
+Or the `taskipy` task:
+
+    task test
+
 In addition to telling you whether or not the tests pass, the above command will write out a code coverage report to the `htmlcov` directory. You can open up `htmlcov/index.html` in a browser and check out the lines of code that were missed by existing tests.
 
+#### style checks
+
 For style checks, you can use [pre-commit](https://pre-commit.com/) to run checks as you develop. To set up `pre-commit`:
 
     pip install pre-commit
@@ -237,6 +256,22 @@ task update_schema_docs -v vX.X.X
 It will write a schema file for the current pydantic model, overwriting any on-disk schema files for
 the provided version.
 
+### updating the sample data
+
+The sample data utilizes another helper script: `repo_utilities/update_sample_data.py` that you can invoke
+with `taskipy` as:
+
+    task update_sample_data
+
+To adjust which sample datasets are included, go edit the `enabled` list in `repo_utilities/update_sample_data.py`. The names in `enabled` must match those accepted by `yt.load_sample`. In addition to enabling
+a dataset, you may need to adjust the field settings for the sample dataset that you are adding: see the `sample_field` and `log_field` dictionaries.
+
+When you run `update_sample_data`, a number of things happen:
+
+1. The napari plugin manifest is updated. For every dataset in the `enabled` list, `yt_napari/napari.yaml` will include 2 entries: a new entry in `commands` and a new entry in `sample_data`.
+2. For every dataset in the `enabled` list, a `json` file will be generated in `yt_napari/sample_data/` along with a single `yt_napari/sample_data/sample_registry.json`. These `json` files are used for actually loading the sample data.
+3. `yt_napari/sample_data/_sample_data.py` will be rewritten and for every dataset in the `enabled` list, there will be a corresponding function. The function name maps to the python name in `yt_napari/napari.yaml` (the plugin manifest file). If `yt_napari/sample_data/_sample_data.py` is incorrect then the code generation in  `repo_utilities/update_sample_data.py` should be updated, do not edit `yt_napari/sample_data/_sample_data.py` directly.
+
 ## License
 
 Distributed under the terms of the [BSD-3] license,

diff --git a/assets/images/readme_sample_data.gif b/assets/images/readme_sample_data.gif
diff --git a/docs/installation.rst b/docs/installation.rst
@@ -27,12 +27,18 @@ See the :code:`yt` `documentation <https://yt-project.org/doc/installing.html#le
 2. install :code:`yt-napari`
 ****************************
 
-You can install the `yt-napari` plugin with:
+You can install the `yt-napari` plugin with minimal dependencies using:
 
 .. code-block:: bash
 
     pip install yt-napari
 
+To include optional dependencies required for loading sample data:
+
+.. code-block:: bash
+
+    pip install yt-napari[full]
+
 If you are missing either :code:`yt` or :code:`napari` (or they need to be updated), the above installation will fetch and run a minimal installation for both.
 
 To install the latest development version of the plugin instead, use:

diff --git a/docs/quickstart.rst b/docs/quickstart.rst
@@ -7,7 +7,10 @@ After installation, there are three modes of using :code:`yt-napari`:
 2. :ref:`loading a json file from the napari gui<jsonload>`
 3. :ref:`napari gui plugins<naparigui>`
 
-Additionally, you can configure some behavior between napari sessions: see  :ref:`Configuring yt-napari<configfile>`.
+Additional quick start topics include:
+
+* Configuring some :code:`yt-napari` behavior between napari sessions: see  :ref:`Configuring yt-napari<configfile>`.
+* Loading sample data: see :ref:`Loading sample data<sampledata>`.
 
 .. _jupyusage:
 
@@ -148,10 +151,24 @@ The following options are available:
 
 * :code:`in_memory_cache`, :code:`bool` (default :code:`true`). When :code:`true`,
 the widget and json-readers will store references to yt datasets in an in-memory
-cache. Subsequents loads of the same dataset will then use the available dataset
+cache. Subsequent loads of the same dataset will then use the available dataset
 handle. This behavior can also be manually controlled in the widget and json
 options -- changing it in the configuration will simply change the default value.
 
 
 Note that boolean values in :code:`toml` files start with lowercase: :code:`true` and
 :code:`false` (instead of :code:`True` and :code:`False`).
+
+.. _sampledata:
+
+Loading sample data
+*******************
+
+A full install of :code:`yt-napari` (:code:`pip install yt-napari[full]`) will
+allow you to load a selection of the
+`yt sample datasets  <https://yt-project.org/data/>`_ from the napari GUI.
+
+Note that some of the sample datasets are large (multiple GBs) and the first time
+that you try to load a dataset you'll have to wait for the datafile to download.
+
+.. image:: _static/readme_sample_data.gif
diff --git a/pyproject.toml b/pyproject.toml
@@ -10,4 +10,5 @@ skip =  ["venv", "benchmarks"]
 [tool.taskipy.tasks]
 validate_release = { cmd = "python repo_utilities/validate.py", help = "validates for a release" }
 update_schema_docs = { cmd = "python repo_utilities/update_schema_docs.py", help = "updates the schema related documentation" }
+update_sample_data = { cmd = "python repo_utilities/update_sample_data.py", help = "updates sample data code" }
 test = "pytest -v --color=yes --cov=yt_napari --cov-report=html"
diff --git a/repo_utilities/update_sample_data.py b/repo_utilities/update_sample_data.py
@@ -0,0 +1,194 @@
+import json
+import os
+from collections import defaultdict
+
+import yaml
+
+# requirements: cartesian, 3D, grid-based
+enabled = [
+    "DeeplyNestedZoom",
+    "Enzo_64",
+    "HiresIsolatedGalaxy",
+    "IsolatedGalaxy",
+    "PopIII_mini",
+    # 'MHDSloshing',
+    "GaussianCloud",
+    "SmartStars",
+    # 'ENZOE_orszag-tang_0.5', # cant handle -, .
+    "GalaxyClusterMerger",  # big but neat
+    # 'InteractingJets',
+    "cm1_tornado_lofs",
+]
+enabled.sort()
+
+# default field to load, whether or not to log
+sample_field = defaultdict(lambda: ("gas", "density"))
+log_field = defaultdict(lambda: True)
+
+
+# over-ride the default for some
+sample_field["cm1_tornado_lofs"] = ("cm1", "dbz")
+log_field["cm1_tornado_lofs"] = False
+
+
+def get_sample_func_name(sample: str):
+    return f"sample_{sample.lower()}"
+
+
+def pop_a_command(command: str, napari_config: dict):
+
+    popid = None
+    for icmd, cmd in enumerate(napari_config["contributions"]["commands"]):
+        if cmd["id"] == command:
+            popid = icmd
+
+    if popid is not None:
+        napari_config["contributions"]["commands"].pop(popid)
+
+
+def get_command_name(sample_name: str):
+    return f"yt-napari.data.{sample_name.lower()}"
+
+
+def get_command_entry(sample_name: str):
+    cmmnd = {}
+    cmmnd["id"] = get_command_name(sample_name)
+    cmmnd["title"] = f"Load {sample_name}"
+    funcname = get_sample_func_name(sample_name)
+    cmmnd["python_name"] = f"yt_napari.sample_data._sample_data:{funcname}"
+    return cmmnd
+
+
+def get_sample_table_entry(sample_name: str):
+    entry = {}
+    entry["key"] = sample_name.lower()
+    entry["display_name"] = sample_name
+    entry["command"] = get_command_name(sample_name)
+    return entry
+
+
+def update_napari_hooks(napari_yaml):
+
+    with open(napari_yaml, "r") as file:
+        napari_config = yaml.safe_load(file)
+
+    existing = []
+    if "sample_data" in napari_config["contributions"]:
+        existing = napari_config["contributions"]["sample_data"]
+
+    # first remove existing commands
+    for sample in existing:
+        pop_a_command(sample["command"], napari_config)
+
+    # now remove the sample data entries
+    napari_config["contributions"]["sample_data"] = []
+
+    # now repopulate
+    for sample in enabled:
+        entry = get_sample_table_entry(sample)
+        napari_config["contributions"]["sample_data"].append(entry)
+
+        new_command = get_command_entry(sample)
+        napari_config["contributions"]["commands"].append(new_command)
+
+    with open(napari_yaml, "w") as file:
+        yaml.dump(napari_config, file)
+
+
+def get_load_dict(sample_name):
+    load_dict = {"datasets": []}
+
+    field_type, field_name = sample_field[sample_name]
+    ds_entry = {
+        "filename": sample_name,
+        "selections": {
+            "regions": [
+                {
+                    "fields": [
+                        {
+                            "field_name": field_name,
+                            "field_type": field_type,
+                            "take_log": log_field[sample_name],
+                        }
+                    ]
+                }
+            ]
+        },
+    }
+    load_dict["datasets"].append(ds_entry)
+    return load_dict
+
+
+def write_sample_jsons(json_dir):
+
+    # first clear out
+    for fname in os.listdir(json_dir):
+        if fname.endswith(".json"):
+            os.remove(os.path.join(json_dir, fname))
+
+    # and add back
+    for sample in enabled:
+        json_name = os.path.join(json_dir, f"sample_{sample.lower()}.json")
+        load_dict = get_load_dict(sample)
+        with open(json_name, "w") as fi:
+            json.dump(load_dict, fi, indent=4)
+        # add newline at end of file to satisfy linting
+        with open(json_name, "a") as fi:
+            fi.write("\n")
+        print(f"    {json_name}")
+
+    enabled_j = {"enabled": enabled}
+    enabled_file = os.path.join(json_dir, "sample_registry.json")
+    with open(enabled_file, "w") as fi:
+        json.dump(enabled_j, fi, indent=4)
+    with open(enabled_file, "a") as fi:
+        fi.write("\n")
+    print(f"    {enabled_file}")
+
+
+def single_sample_loader(sample: str):
+    code = []
+    code.append(f"def {get_sample_func_name(sample)}() -> List[Layer]:")
+    loadstr = '    return gl.load_sample_data("'
+    loadstr += sample
+    loadstr += '")'
+    code.append(loadstr)
+    code.append("")
+    code.append("")
+    return code
+
+
+def write_sample_data_python_loaders(sample_data_dir):
+    sd_py = []
+    sd_py.append("# this file is autogenerated byt the taskipy update_sample data task")
+    sd_py.append("# to re-generate it, along with all the json files in this dir, run:")
+    sd_py.append("#     task update_sample_data")
+    sd_py.append("# (requires taskipy: pip install taskipy)")
+    sd_py.append("# do NOT edit this file directly, instead go modify")
+    sd_py.append("# repo_utilities/update_sample_data.py and then re-run the task.")
+    sd_py.append("from typing import List")
+    sd_py.append("")
+    sd_py.append("from yt_napari._types import Layer")
+    sd_py.append("from yt_napari.sample_data import _generic_loader as gl")
+    sd_py.append("")
+    sd_py.append("")
+
+    for sample in enabled:
+        sample_code = single_sample_loader(sample)
+        sd_py += sample_code
+
+    sd_py.pop(-1)  # only want one blank line at the end
+
+    loader_file = os.path.join(sample_data_dir, "_sample_data.py")
+    with open(loader_file, "w") as fi:
+        fi.write("\n".join(sd_py))
+
+
+if __name__ == "__main__":
+
+    print("updating src/yt_napari/napari.yaml")
+    update_napari_hooks("src/yt_napari/napari.yaml")
+    print("writing out sample jsons to src/yt_napari/sample_data/")
+    write_sample_jsons("src/yt_napari/sample_data/")
+    print("writing src/yt_napari/sample_data/_sample_data.py")
+    write_sample_data_python_loaders("src/yt_napari/sample_data/")
diff --git a/setup.cfg b/setup.cfg
@@ -55,6 +55,9 @@ napari.manifest =
 [options.extras_require]
 full =
     dask[distributed,array]
+    pooch
+    pandas
+    yt[enzo]
 docs =
     sphinx
     nbsphinx<0.8.8