Skip to content

Commit

Permalink
Merge pull request #149 from chrishavlin/load_sample_data
Browse files Browse the repository at this point in the history
Adding sample data
  • Loading branch information
chrishavlin authored Oct 23, 2024
2 parents cacb559 + 1bb93d0 commit c132a66
Show file tree
Hide file tree
Showing 28 changed files with 801 additions and 43 deletions.
5 changes: 5 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
## v0.5.0dev

### New Features
* sample data now available!

## v0.5.0

### New Features
Expand Down
43 changes: 39 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,13 +152,21 @@ Contributions are very welcome! Development follows a fork and pull request work

### development environment

To start developing, fork the repository and clone your fork to get a local copy. You can then install in development mode with
To start developing, fork the repository and clone your fork to get a local copy. You can then install in development mode along with
all the extra requirements for developing:

pip install -e .
pip install -e .[full,dev]

### tests and style checks

Both bug fixes and new features will need to pass the existing test suite and style checks. While both will be run automatically when you submit a pull request, it is helpful to run the test suites locally and run style checks throughout development. For testing, you can use [tox] to test different python versions on your platform.
Both bug fixes and new features will need to pass the existing test suite and style checks. While both will be run
automatically when you submit a pull request, it is helpful to run the test suites locally and run style checks
throughout development. For testing, you can use [tox] to test different python versions on your platform or
simply run `pytest` and rely on the github actions to test the additional python environments.

#### testing with tox

first install `tox` with:

pip install tox

Expand All @@ -168,16 +176,27 @@ And then from the top level of the `yt-napari` directory, run

Tox will then run a series of tests in isolated environments. In addition to checking the terminal output for test results, the tox run will generate a test coverage report: a `coverage.xml` file and a `htmlcov` folder -- to view the results, open `htmlcov/index.html` in a browser.

#### testing with pytest

If you prefer a lighter weight test, you can also use `pytest` directly and rely on the Github CI to test different python versions and systems. To do so, first install `pytest` and some related plugins:

pip install pytest pytest-qt pytest-cov

Now, to run the tests:
Note that if you set up your dev environment with `pip install -e .[full,dev]` as suggested above, you'll arelady
have these dependencies.

To run the tests you can use the `pytest` command

pytest -v --cov=yt_napari --cov-report=html

Or the `taskipy` task:

task test

In addition to telling you whether or not the tests pass, the above command will write out a code coverage report to the `htmlcov` directory. You can open up `htmlcov/index.html` in a browser and check out the lines of code that were missed by existing tests.

#### style checks

For style checks, you can use [pre-commit](https://pre-commit.com/) to run checks as you develop. To set up `pre-commit`:

pip install pre-commit
Expand Down Expand Up @@ -237,6 +256,22 @@ task update_schema_docs -v vX.X.X
It will write a schema file for the current pydantic model, overwriting any on-disk schema files for
the provided version.

### updating the sample data

The sample data utilizes another helper script: `repo_utilities/update_sample_data.py` that you can invoke
with `taskipy` as:

task update_sample_data

To adjust which sample datasets are included, go edit the `enabled` list in `repo_utilities/update_sample_data.py`. The names in `enabled` must match those accepted by `yt.load_sample`. In addition to enabling
a dataset, you may need to adjust the field settings for the sample dataset that you are adding: see the `sample_field` and `log_field` dictionaries.

When you run `update_sample_data`, a number of things happen:

1. The napari plugin manifest is updated. For every dataset in the `enabled` list, `yt_napari/napari.yaml` will include 2 entries: a new entry in `commands` and a new entry in `sample_data`.
2. For every dataset in the `enabled` list, a `json` file will be generated in `yt_napari/sample_data/` along with a single `yt_napari/sample_data/sample_registry.json`. These `json` files are used for actually loading the sample data.
3. `yt_napari/sample_data/_sample_data.py` will be rewritten and for every dataset in the `enabled` list, there will be a corresponding function. The function name maps to the python name in `yt_napari/napari.yaml` (the plugin manifest file). If `yt_napari/sample_data/_sample_data.py` is incorrect then the code generation in `repo_utilities/update_sample_data.py` should be updated, do not edit `yt_napari/sample_data/_sample_data.py` directly.

## License

Distributed under the terms of the [BSD-3] license,
Expand Down
Binary file added assets/images/readme_sample_data.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 7 additions & 1 deletion docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,18 @@ See the :code:`yt` `documentation <https://yt-project.org/doc/installing.html#le
2. install :code:`yt-napari`
****************************

You can install the `yt-napari` plugin with:
You can install the `yt-napari` plugin with minimal dependencies using:

.. code-block:: bash
pip install yt-napari
To include optional dependencies required for loading sample data:

.. code-block:: bash
pip install yt-napari[full]
If you are missing either :code:`yt` or :code:`napari` (or they need to be updated), the above installation will fetch and run a minimal installation for both.

To install the latest development version of the plugin instead, use:
Expand Down
21 changes: 19 additions & 2 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,10 @@ After installation, there are three modes of using :code:`yt-napari`:
2. :ref:`loading a json file from the napari gui<jsonload>`
3. :ref:`napari gui plugins<naparigui>`

Additionally, you can configure some behavior between napari sessions: see :ref:`Configuring yt-napari<configfile>`.
Additional quick start topics include:

* Configuring some :code:`yt-napari` behavior between napari sessions: see :ref:`Configuring yt-napari<configfile>`.
* Loading sample data: see :ref:`Loading sample data<sampledata>`.

.. _jupyusage:

Expand Down Expand Up @@ -148,10 +151,24 @@ The following options are available:

* :code:`in_memory_cache`, :code:`bool` (default :code:`true`). When :code:`true`,
the widget and json-readers will store references to yt datasets in an in-memory
cache. Subsequents loads of the same dataset will then use the available dataset
cache. Subsequent loads of the same dataset will then use the available dataset
handle. This behavior can also be manually controlled in the widget and json
options -- changing it in the configuration will simply change the default value.


Note that boolean values in :code:`toml` files start with lowercase: :code:`true` and
:code:`false` (instead of :code:`True` and :code:`False`).

.. _sampledata:

Loading sample data
*******************

A full install of :code:`yt-napari` (:code:`pip install yt-napari[full]`) will
allow you to load a selection of the
`yt sample datasets <https://yt-project.org/data/>`_ from the napari GUI.

Note that some of the sample datasets are large (multiple GBs) and the first time
that you try to load a dataset you'll have to wait for the datafile to download.

.. image:: _static/readme_sample_data.gif
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,5 @@ skip = ["venv", "benchmarks"]
[tool.taskipy.tasks]
validate_release = { cmd = "python repo_utilities/validate.py", help = "validates for a release" }
update_schema_docs = { cmd = "python repo_utilities/update_schema_docs.py", help = "updates the schema related documentation" }
update_sample_data = { cmd = "python repo_utilities/update_sample_data.py", help = "updates sample data code" }
test = "pytest -v --color=yes --cov=yt_napari --cov-report=html"
194 changes: 194 additions & 0 deletions repo_utilities/update_sample_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
import json
import os
from collections import defaultdict

import yaml

# requirements: cartesian, 3D, grid-based
enabled = [
"DeeplyNestedZoom",
"Enzo_64",
"HiresIsolatedGalaxy",
"IsolatedGalaxy",
"PopIII_mini",
# 'MHDSloshing',
"GaussianCloud",
"SmartStars",
# 'ENZOE_orszag-tang_0.5', # cant handle -, .
"GalaxyClusterMerger", # big but neat
# 'InteractingJets',
"cm1_tornado_lofs",
]
enabled.sort()

# default field to load, whether or not to log
sample_field = defaultdict(lambda: ("gas", "density"))
log_field = defaultdict(lambda: True)


# over-ride the default for some
sample_field["cm1_tornado_lofs"] = ("cm1", "dbz")
log_field["cm1_tornado_lofs"] = False


def get_sample_func_name(sample: str):
return f"sample_{sample.lower()}"


def pop_a_command(command: str, napari_config: dict):

popid = None
for icmd, cmd in enumerate(napari_config["contributions"]["commands"]):
if cmd["id"] == command:
popid = icmd

if popid is not None:
napari_config["contributions"]["commands"].pop(popid)


def get_command_name(sample_name: str):
return f"yt-napari.data.{sample_name.lower()}"


def get_command_entry(sample_name: str):
cmmnd = {}
cmmnd["id"] = get_command_name(sample_name)
cmmnd["title"] = f"Load {sample_name}"
funcname = get_sample_func_name(sample_name)
cmmnd["python_name"] = f"yt_napari.sample_data._sample_data:{funcname}"
return cmmnd


def get_sample_table_entry(sample_name: str):
entry = {}
entry["key"] = sample_name.lower()
entry["display_name"] = sample_name
entry["command"] = get_command_name(sample_name)
return entry


def update_napari_hooks(napari_yaml):

with open(napari_yaml, "r") as file:
napari_config = yaml.safe_load(file)

existing = []
if "sample_data" in napari_config["contributions"]:
existing = napari_config["contributions"]["sample_data"]

# first remove existing commands
for sample in existing:
pop_a_command(sample["command"], napari_config)

# now remove the sample data entries
napari_config["contributions"]["sample_data"] = []

# now repopulate
for sample in enabled:
entry = get_sample_table_entry(sample)
napari_config["contributions"]["sample_data"].append(entry)

new_command = get_command_entry(sample)
napari_config["contributions"]["commands"].append(new_command)

with open(napari_yaml, "w") as file:
yaml.dump(napari_config, file)


def get_load_dict(sample_name):
load_dict = {"datasets": []}

field_type, field_name = sample_field[sample_name]
ds_entry = {
"filename": sample_name,
"selections": {
"regions": [
{
"fields": [
{
"field_name": field_name,
"field_type": field_type,
"take_log": log_field[sample_name],
}
]
}
]
},
}
load_dict["datasets"].append(ds_entry)
return load_dict


def write_sample_jsons(json_dir):

# first clear out
for fname in os.listdir(json_dir):
if fname.endswith(".json"):
os.remove(os.path.join(json_dir, fname))

# and add back
for sample in enabled:
json_name = os.path.join(json_dir, f"sample_{sample.lower()}.json")
load_dict = get_load_dict(sample)
with open(json_name, "w") as fi:
json.dump(load_dict, fi, indent=4)
# add newline at end of file to satisfy linting
with open(json_name, "a") as fi:
fi.write("\n")
print(f" {json_name}")

enabled_j = {"enabled": enabled}
enabled_file = os.path.join(json_dir, "sample_registry.json")
with open(enabled_file, "w") as fi:
json.dump(enabled_j, fi, indent=4)
with open(enabled_file, "a") as fi:
fi.write("\n")
print(f" {enabled_file}")


def single_sample_loader(sample: str):
code = []
code.append(f"def {get_sample_func_name(sample)}() -> List[Layer]:")
loadstr = ' return gl.load_sample_data("'
loadstr += sample
loadstr += '")'
code.append(loadstr)
code.append("")
code.append("")
return code


def write_sample_data_python_loaders(sample_data_dir):
sd_py = []
sd_py.append("# this file is autogenerated byt the taskipy update_sample data task")
sd_py.append("# to re-generate it, along with all the json files in this dir, run:")
sd_py.append("# task update_sample_data")
sd_py.append("# (requires taskipy: pip install taskipy)")
sd_py.append("# do NOT edit this file directly, instead go modify")
sd_py.append("# repo_utilities/update_sample_data.py and then re-run the task.")
sd_py.append("from typing import List")
sd_py.append("")
sd_py.append("from yt_napari._types import Layer")
sd_py.append("from yt_napari.sample_data import _generic_loader as gl")
sd_py.append("")
sd_py.append("")

for sample in enabled:
sample_code = single_sample_loader(sample)
sd_py += sample_code

sd_py.pop(-1) # only want one blank line at the end

loader_file = os.path.join(sample_data_dir, "_sample_data.py")
with open(loader_file, "w") as fi:
fi.write("\n".join(sd_py))


if __name__ == "__main__":

print("updating src/yt_napari/napari.yaml")
update_napari_hooks("src/yt_napari/napari.yaml")
print("writing out sample jsons to src/yt_napari/sample_data/")
write_sample_jsons("src/yt_napari/sample_data/")
print("writing src/yt_napari/sample_data/_sample_data.py")
write_sample_data_python_loaders("src/yt_napari/sample_data/")
3 changes: 3 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@ napari.manifest =
[options.extras_require]
full =
dask[distributed,array]
pooch
pandas
yt[enzo]
docs =
sphinx
nbsphinx<0.8.8
Expand Down
Loading

0 comments on commit c132a66

Please sign in to comment.