Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "datastores" to represent input data from zarr, npy, etc #66

Merged
merged 358 commits into from
Nov 21, 2024
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
358 commits
Select commit Hold shift + click to select a range
c52f98e
npy mllam nearly done
leifdenby Jul 6, 2024
80f3639
minor adjustment
leifdenby Jul 7, 2024
048f8c6
Merge branch 'main' of https://github.com/mllam/neural-lam into maint…
leifdenby Jul 11, 2024
5aaa239
add pooch and tweak pip cicd testing
leifdenby Jul 11, 2024
66c3b03
combine cicd tests with caching
leifdenby Jul 11, 2024
8566b8f
linting
leifdenby Jul 11, 2024
29bd9e5
add pyg dep
leifdenby Jul 11, 2024
bc7f028
set cirun aws region to frankfurt
leifdenby Jul 11, 2024
2070166
adapt image
leifdenby Jul 11, 2024
e4e86e5
set image
leifdenby Jul 11, 2024
1fba8fe
try different image
leifdenby Jul 11, 2024
02b77cf
add pooch to cicd
leifdenby Jul 11, 2024
b481929
add pdm gpu test
leifdenby Jul 16, 2024
bcec472
start work on readme
leifdenby Jul 16, 2024
c5beec9
Merge branch 'maint/deps-in-pyproject-toml' into datastore
leifdenby Jul 16, 2024
e89facc
Merge branch 'main' into maint/refactor-as-package
leifdenby Jul 16, 2024
0b5687a
Merge branch 'main' of https://github.com/mllam/neural-lam into maint…
leifdenby Jul 16, 2024
095fdbc
turn meps testdata download into pytest fixture
leifdenby Jul 16, 2024
49e9bfe
adapt README for package
leifdenby Jul 16, 2024
12cc02b
remove pdm cicd test (will be in separate PR)
leifdenby Jul 16, 2024
b47f50b
remove pdm in gitignore
leifdenby Jul 16, 2024
90d99ca
remove pdm and pyproject files (will be sep PR)
leifdenby Jul 16, 2024
a91eaaa
add pyproject.toml from main
leifdenby Jul 16, 2024
5508cea
clean out tests
leifdenby Jul 16, 2024
5c623c3
fix linting
leifdenby Jul 16, 2024
08ec168
add cli entrypoints import test
leifdenby Jul 16, 2024
d9cf7ba
Merge branch 'maint/refactor-as-package' into datastore
leifdenby Jul 16, 2024
3954f04
tweak cicd pytest execution
leifdenby Jul 16, 2024
f99fdce
Merge branch 'maint/refactor-as-package' into datastore
leifdenby Jul 16, 2024
db9d96f
Update tests/test_mllam_dataset.py
leifdenby Jul 17, 2024
3c864b2
grid-shape ok
leifdenby Jul 17, 2024
1f54b0e
get_vars_names and units
leifdenby Jul 17, 2024
9b88160
get_vars_names and units 2
leifdenby Jul 17, 2024
a9fdad5
test for stats
leifdenby Jul 23, 2024
555154f
get_dataarray test
leifdenby Jul 24, 2024
8b8a77e
get_dataarray test
leifdenby Jul 24, 2024
41f11cd
boundary_mask
leifdenby Jul 24, 2024
a17de0f
get_xy
leifdenby Jul 24, 2024
0a38a7d
remove TrainingSample dataclass
leifdenby Jul 24, 2024
f65f6b5
test for WeatherDataset.__getitem__
leifdenby Jul 24, 2024
a35100e
test for graph creation
leifdenby Jul 24, 2024
cfb0618
more graph creation tests
leifdenby Jul 24, 2024
8698719
check for consistency of num features across splits
leifdenby Jul 24, 2024
3381404
test for single batch from mllam through model
leifdenby Jul 24, 2024
2a6796c
Add init files to expose classes in editable package
joeloskarsson Jul 24, 2024
8f4e0e0
Linting
joeloskarsson Jul 24, 2024
e657abb
working training_step with datastores!
Jul 25, 2024
effc99b
remove superfluous tests
Jul 25, 2024
a047026
fix for dataset length
Jul 25, 2024
d2c62ed
step length should be int
Jul 25, 2024
58f5d99
step length should be int
Jul 25, 2024
64d43a6
training working with mllam datastore!
Jul 25, 2024
07444f8
adapt neural_lam.train_model for datastores
Jul 25, 2024
d1b6fc1
fixes for npy
Jul 25, 2024
6fe19ac
npyfiles datastore complete
leifdenby Jul 26, 2024
fe65a4d
cleanup for datastore examples
leifdenby Jul 26, 2024
e533794
training on ohm with danra!
Jul 26, 2024
640ac05
use mllam-data-prep v0.2.0
Aug 5, 2024
0f16f13
remove py3.12 from pre-commit
Aug 5, 2024
724548e
cleanup
Aug 8, 2024
a1b2037
all tests passing!
Aug 12, 2024
e35958f
use mllam-data-prep v0.3.0
Aug 12, 2024
8b92318
delete requirements.txt
Aug 13, 2024
658836a
remove .DS_Store
Aug 13, 2024
421efed
use tmate in gpu pdm cicd
Aug 13, 2024
05f1e9f
remove requirements
Aug 13, 2024
3afe0e4
update pdm gpu cicd setup to pdm venv on nvme drive
Aug 13, 2024
f3d028b
don't try to use pdm venv in-project
Aug 13, 2024
2c35662
remove tmate
Aug 13, 2024
5f30255
update README with install instructions
Aug 14, 2024
b2b5631
changelog
Aug 14, 2024
c8ae829
update ci/cd badges to include gpu + gpu
Aug 14, 2024
e7cf2c0
Merge pull request #1 from mllam/package_inits
leifdenby Aug 14, 2024
0b72e9d
add pyproject-flake8 to precommit config
Aug 14, 2024
190d1de
use Flake8-pyproject instead
Aug 14, 2024
791af0a
update README
Aug 14, 2024
58fab84
Merge branch 'maint/deps-in-pyproject-toml' into feat/datastores
Aug 14, 2024
dbe2e6d
Merge branch 'maint/refactor-as-package' into maint/deps-in-pyproject…
Aug 14, 2024
eac6e35
Merge branch 'maint/deps-in-pyproject-toml' into feat/datastores
Aug 14, 2024
799d55e
linting fixes
Aug 14, 2024
57bbb81
train only 1 epoch in cicd and print to stdout
Aug 14, 2024
a955cee
log datastore config
Aug 14, 2024
0a79c74
cleanup doctrings
Aug 15, 2024
9f3c014
Merge branch 'maint/refactor-as-package' into datastore
leifdenby Aug 19, 2024
41364a8
Merge branch 'main' of https://github.com/mllam/neural-lam into maint…
leifdenby Aug 19, 2024
3422298
update changelog
leifdenby Aug 19, 2024
689ef69
move dev deps optional dependencies group
leifdenby Aug 20, 2024
9a0d538
update cicd tests to install dev deps
leifdenby Aug 20, 2024
bddfcaf
update readme with new dev deps group
leifdenby Aug 20, 2024
b96cfdc
quote the skip step the install readme
leifdenby Aug 20, 2024
2600dee
remove unused files
leifdenby Aug 20, 2024
65a8074
Merge branch 'feat/datastores' of https://github.com/leifdenby/neural…
leifdenby Aug 20, 2024
6adf6cc
revert to line length of 80
leifdenby Aug 20, 2024
46b37f8
revert docstring formatting changes
leifdenby Aug 20, 2024
3cd0f8b
pin numpy to <2.0.0
leifdenby Aug 20, 2024
826270a
Merge branch 'maint/deps-in-pyproject-toml' into feat/datastores
leifdenby Aug 20, 2024
4ba22ea
Merge branch 'main' into feat/datastores
leifdenby Aug 20, 2024
1f661c6
fix flake8 linting errors
leifdenby Aug 20, 2024
4838872
Update neural_lam/weather_dataset.py
leifdenby Sep 8, 2024
b59e7e5
Update neural_lam/datastore/multizarr/create_normalization_stats.py
leifdenby Sep 8, 2024
75b1fe7
Update neural_lam/datastore/npyfiles/store.py
leifdenby Sep 8, 2024
7e736cb
Update neural_lam/datastore/npyfiles/store.py
leifdenby Sep 8, 2024
613a7e2
Update neural_lam/datastore/npyfiles/store.py
leifdenby Sep 8, 2024
65e199b
Update tests/test_training.py
leifdenby Sep 8, 2024
4435e26
Update tests/test_datasets.py
leifdenby Sep 8, 2024
4693408
Update README.md
leifdenby Sep 8, 2024
2dfed2c
update README
leifdenby Sep 10, 2024
c3d033d
Merge branch 'main' of https://github.com/mllam/neural-lam into feat/…
leifdenby Sep 10, 2024
4a70268
Merge branch 'feat/datastores' of https://github.com/leifdenby/neural…
leifdenby Sep 10, 2024
66c663f
column_water -> open_water_fraction
leifdenby Sep 10, 2024
11a7978
fix linting
leifdenby Sep 10, 2024
a41c314
static data same for all splits
leifdenby Sep 10, 2024
6f1efd6
forcing_window_size from args
leifdenby Sep 10, 2024
bacb9ec
Update neural_lam/datastore/base.py
leifdenby Sep 10, 2024
4a9db4e
only use first ensemble member in datastores
leifdenby Sep 10, 2024
4fc2448
Merge branch 'feat/datastores' of https://github.com/leifdenby/neural…
leifdenby Sep 10, 2024
bcaa919
Update neural_lam/datastore/base.py
leifdenby Sep 10, 2024
90bc594
Update neural_lam/datastore/base.py
leifdenby Sep 10, 2024
5bda935
Update neural_lam/datastore/base.py
leifdenby Sep 10, 2024
8e7931d
remove all multizarr functionality
leifdenby Sep 10, 2024
6998683
cleanup and test fixes for recent changes
leifdenby Sep 10, 2024
c415008
Merge branch 'feat/datastores' of https://github.com/leifdenby/neural…
leifdenby Sep 10, 2024
735d324
fix linting
leifdenby Sep 10, 2024
5f2d919
remove multizar example files
leifdenby Sep 10, 2024
5263d2c
normalization -> standardization
leifdenby Sep 10, 2024
ba1bec3
fix import for tests
leifdenby Sep 10, 2024
d04d15e
Update neural_lam/datastore/base.py
leifdenby Sep 10, 2024
743d7a1
fix coord issues and add datastore example plotting cli
leifdenby Sep 12, 2024
ac10d7d
add lru_cache to get_xy_extent
leifdenby Sep 12, 2024
bf8172a
MLLAMDatastore -> MDPDatastore
leifdenby Sep 12, 2024
90ca400
missed renames for MDPDatastore
leifdenby Sep 12, 2024
154139d
update graph plot for datastores
leifdenby Sep 12, 2024
50ee0b0
use relative import
leifdenby Sep 12, 2024
7dfd570
add long_names and refactor npyfiles create weights
leifdenby Sep 12, 2024
2b45b5a
Update neural_lam/weather_dataset.py
leifdenby Sep 23, 2024
aee0b1c
Update neural_lam/weather_dataset.py
leifdenby Sep 23, 2024
8453c2b
Update neural_lam/models/ar_model.py
leifdenby Sep 27, 2024
7f32557
Update neural_lam/weather_dataset.py
leifdenby Sep 27, 2024
67998b8
read projection from datastore config extra section
leifdenby Sep 27, 2024
ac7e46a
NpyFilesDatastore -> NpyFilesDatastoreMEPS
leifdenby Sep 27, 2024
b7bf506
revert tp training with 1 AR step by default
leifdenby Sep 27, 2024
5df2ecf
add missing kwarg to BaseHiGraphModel.__init__
leifdenby Sep 27, 2024
d4d438f
add missing kwarg to HiLAM.__init__
leifdenby Sep 27, 2024
1889771
add missing kwarg to HiLAMParallel
leifdenby Sep 27, 2024
2c3bbde
check that for enough forecast steps given ar_steps
leifdenby Sep 27, 2024
f0a151b
remove numpy<2.0.0 version cap
leifdenby Sep 27, 2024
f3566b0
tweak print statement working in mdp
Oct 1, 2024
dba94b3
fix missed removed argument from cli
Oct 1, 2024
bca1482
remove wandb config log comment, we log now
Oct 1, 2024
fc973c4
ensure loading from checkpoint during train possible
Oct 1, 2024
9fcf06e
get step_length from datastore in plot_error_map
leifdenby Oct 1, 2024
2bbe666
remove step_legnth attr in ARModel
leifdenby Oct 1, 2024
b41ed2f
remove unused obs_mask arg for vis.plot_prediction
leifdenby Oct 1, 2024
7e46194
ensure no reference to multizarr "data_config"
leifdenby Oct 1, 2024
b57bc7a
introduce neural-lam config
leifdenby Oct 2, 2024
2b30715
include meps neural-lam config example
leifdenby Oct 2, 2024
8e7b2e6
fix extra space typo in BaseDatastore
leifdenby Oct 2, 2024
e0300fb
add check and print of train/test/val split in MDPDatastore
leifdenby Oct 2, 2024
d1b4ca7
BaseCartesianDatastore -> BaseRegularGridDatastore
leifdenby Oct 3, 2024
de46fb4
removed `control_only' arg
sadamov Oct 23, 2024
c1a7159
All flags are explicit
sadamov Oct 23, 2024
5b02761
removed multizarr, obsolete
sadamov Oct 23, 2024
f80fe4a
robust import of conftest
sadamov Oct 23, 2024
0222759
fixed torch List typing
sadamov Oct 23, 2024
3d91f7c
graph creation is handled by WMG
sadamov Oct 23, 2024
65cb4a8
graph creation now handled in WGM
sadamov Oct 23, 2024
b1e2097
Making sure that all tensors, arrays and datesets follow the same ord…
sadamov Oct 24, 2024
96900c1
expanding the dummy class to support all tests
sadamov Oct 24, 2024
4281a12
clarify comment about array shape
sadamov Oct 24, 2024
84ea4d3
Add caching decorator
sadamov Oct 24, 2024
930a13d
by default dataset is written to_zarr
sadamov Oct 24, 2024
2b9f00d
prevent removal of old zarr-archives
sadamov Oct 24, 2024
d9e4822
Removed dev-dependencies
sadamov Oct 24, 2024
4bed96e
rename sampling to slicing
sadamov Oct 24, 2024
1f58798
Align datastore_config_path arguments
sadamov Oct 25, 2024
53f32aa
imlement flexible window slices for forcings (past & future)
sadamov Oct 25, 2024
6a20a9d
Expanded docstring for stacked get_xy
sadamov Oct 25, 2024
5c9b4c5
Implementation of feature weights
sadamov Oct 25, 2024
0a41d0c
Removing some obsolete "better"-comments
sadamov Oct 25, 2024
239aad7
Bugfixes and better documentation of time slicing operations
sadamov Oct 27, 2024
98dde82
reintroduction of create_graphp
sadamov Oct 28, 2024
01e6dff
implementation of state_feature_weights
sadamov Oct 28, 2024
0878fce
bugfix for length of forcing window
sadamov Oct 28, 2024
1b9d253
formatting
sadamov Oct 28, 2024
85aa170
update README.md to reflect renaming of create_mesh to create_graph
khintz Oct 29, 2024
b9c9951
update instructions on creating graph
khintz Oct 29, 2024
9e586d8
Replace component_dependencies figure with mermaid diagram
khintz Oct 29, 2024
f6c6404
add index selection to datastore example plot cli
leifdenby Nov 4, 2024
8421a6a
more work on readme
leifdenby Nov 5, 2024
3c045c7
Merge pull request #2 from sadamov/feat/datastores
leifdenby Nov 6, 2024
8deace8
Merge branch 'feat/datastores' of https://github.com/leifdenby/neural…
leifdenby Nov 6, 2024
8149d65
Use "datastore" in config filename for datastores
leifdenby Nov 6, 2024
514b0d1
add loss-weighting config and implementations
leifdenby Nov 8, 2024
e642cb0
Bugfix for forcing window calculation
sadamov Nov 8, 2024
599917d
Fix shift by init_steps
sadamov Nov 8, 2024
ef3da41
Cover cases where include_past_forcing > init_steps
sadamov Nov 8, 2024
74828fa
Merge main branch into datastores and resolve README conflicts
joeloskarsson Nov 11, 2024
8cc6c3d
Add test for dataset length using different configs
joeloskarsson Nov 11, 2024
731910f
Updates for datastore examples and neural-lam config
leifdenby Nov 12, 2024
ff02af7
linting fix
leifdenby Nov 12, 2024
b5844c0
only prepare npymeps test example files that don't exist
leifdenby Nov 12, 2024
b7a10ef
ensure dimension order from BaseRegularGridDatastore.stack_grid_coords
leifdenby Nov 12, 2024
31ebfc8
get datastore static data in ARModel without defining split
leifdenby Nov 12, 2024
f022365
Update neural_lam/train_model.py
leifdenby Nov 12, 2024
b33e863
Update neural_lam/weather_dataset.py
leifdenby Nov 12, 2024
a8362ce
suggest to reduce ar_steps and forcing window with small dataset
leifdenby Nov 12, 2024
b2e0874
adapt dummy datastore to generate on equal area grid
leifdenby Nov 12, 2024
772cc20
adapt all cli to use --config arg instead of `config`
leifdenby Nov 12, 2024
89b10b5
add test for datastores example plot function
leifdenby Nov 12, 2024
d355ef5
bugfix for earlier unstacking dim order fix in datastores
leifdenby Nov 12, 2024
1121d9f
add enforcement of datastores output dimension order
leifdenby Nov 13, 2024
9afaf6e
fix bugs introduced with dimension order during stack/unstack
leifdenby Nov 13, 2024
3df627f
update meps test to point to new dataset on aws
leifdenby Nov 13, 2024
89fac82
remove unused print statement
leifdenby Nov 13, 2024
a95eb5a
fix config-path arg bug in CLIs
leifdenby Nov 13, 2024
b56e47a
renaming the forcing arguments
sadamov Nov 13, 2024
258079c
Merge branch 'feat/datastores' of github.com:leifdenby/neural-lam int…
sadamov Nov 13, 2024
d458677
fix bug for datastore ref in ARModel.plot_examples()
leifdenby Nov 13, 2024
223db37
improved docstring for forcings
sadamov Nov 13, 2024
bcc3e51
Adjusting index of flux based on datastore
sadamov Nov 13, 2024
f97719b
defined forcings to be 0, meaningless for stats_calc in MEPS
sadamov Nov 13, 2024
df4d39c
fix typo (missing datastore) in ARModel.on_test_epoch_end
leifdenby Nov 13, 2024
46f161c
setting ar_steps to 63 for stats calc in MEPS
sadamov Nov 13, 2024
38cdfe6
Merge branch 'feat/datastores' of https://github.com/leifdenby/neural…
sadamov Nov 13, 2024
f36d1ce
more verbose ci/cd testing and update meps cache
leifdenby Nov 13, 2024
cd53b21
Bugfix, `idx` removed from forecast forcing window indices
sadamov Nov 13, 2024
98706c1
Merge branch 'feat/datastores' of https://github.com/leifdenby/neural…
sadamov Nov 13, 2024
7c62778
absolute imports
sadamov Nov 14, 2024
8ddd0de
change datastore arg type
sadamov Nov 14, 2024
0f24924
calling cli() instead of main();
sadamov Nov 14, 2024
acb8ffa
add test for state/forcing values from time-slicing
leifdenby Nov 14, 2024
7fe1726
Update tests/test_time_slicing.py
leifdenby Nov 14, 2024
4fe2cea
Fix typo in time slicing test
joeloskarsson Nov 14, 2024
dfec1ec
formatting
sadamov Nov 16, 2024
f56a999
enable worker argument, but set to zero for tests
sadamov Nov 16, 2024
4a5ae6c
reduce workers to zero
sadamov Nov 16, 2024
9f0120b
revert num_workers to 1 in test
sadamov Nov 16, 2024
665368d
Fix missing datastore kind in plot script
joeloskarsson Nov 18, 2024
a90a979
replace transpose in WeatherDataset.__getitem__ with assert
leifdenby Nov 18, 2024
6fedea5
Merge torch.load change from main into datastores
joeloskarsson Nov 18, 2024
0180ca0
Merge branch 'main' into datastores
joeloskarsson Nov 18, 2024
93c20fc
default config path should be None for datastore plote example
leifdenby Nov 18, 2024
f6da2b2
return stacked coords by default from BaseRegularGridDatastore.get_xy()
leifdenby Nov 18, 2024
fc6be8d
Fix typos and clarifications in readme
joeloskarsson Nov 19, 2024
9787869
Fix dim ordering in time slicing test
joeloskarsson Nov 19, 2024
4cb44de
Reduce example size of single batch and training tests to save memory
joeloskarsson Nov 19, 2024
daf1dbc
Add changelog entry
joeloskarsson Nov 19, 2024
f922542
use mllam-data-prep v0.5.0
leifdenby Nov 20, 2024
9e8c08f
add support for setting globe properties in projection
leifdenby Nov 20, 2024
4302d58
update path for meps data chache in ci/cd
leifdenby Nov 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 122 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,55 @@ Still, some restrictions are inevitable:
</p>


# Using Neural-LAM
Below follows instructions on how to use Neural-LAM to train and evaluate models. Once `neural-lam` has been installed the general process is:

1. Run any pre-processing scripts to generate the necessary derived data that your chosen datastore requires
2. Run graph-creation step
3. Train the model

## Data

To enable flexibility in what input-data sources can be used with neural-lam,
the input-data representation is split into two parts:

1. a "datastore" (represented by instances of
[neural_lam.datastore.BaseDataStore](neural_lam/datastore/base.py)) which
takes care of loading a given category (state, forcing or static) and split
(train/val/test) of data from disk and returning it as a `xarray.DataArray`.
The returned data-array is expected to have the spatial coordinates
flattened into a single `grid_index` dimension and all variables and vertical
levels stacked into a feature dimension (named as `{category}_feature`) The
datastore also provides information about the number, names and units of
variables in the data, the boundary mask, normalisation values and grid
information.

2. a `pytorch.Dataset`-derived class (called
`neural_lam.weather_dataset.WeatherDataset`) which takes care of sampling in
time to create individual samples for training, validation and testing. The
`WeatherDataset` class is also responsible for normalising the values and
returning `torch.Tensor`-objects.

There are currently three different datastores implemented in the codebase:

1. `neural_lam.datastore.NpyDataStore` which reads MEPS data from `.npy`-files in
the format introduced in neural-lam `v0.1.0`. Note that this datastore is specific to the format of the MEPS dataset, but can act as an example for how to create similar numpy-based datastores.

2. `neural_lam.datastore.MultizarrDatastore` which can combines multiple zarr
files during train/val/test sampling, with the transformations to facilitate
this implemented within `neural_lam.datastore.MultizarrDatastore`.

3. `neural_lam.datastore.MDPDatastore` which can combine multiple zarr
datasets either either as a preprocessing step or during sampling, but
offloads the implementation of the transformations the
[mllam-data-prep](https://github.com/mllam/mllam-data-prep) package.

If neither of these options fit your need you can create your own datastore by
subclassing the `neural_lam.datastore.BaseDataStore` class or
`neural_lam.datastore.BaseCartesianDatastore` class (if your data is stored on
a Cartesian grid) and implementing the abstract methods.


## Installation

When installing `neural-lam` you have a choice of either installing with
Expand Down Expand Up @@ -235,68 +284,42 @@ All graphs used in the paper are also available for download at the same link (b
Note that this is far too little data to train any useful models, but all pre-processing and training steps can be run with it.
It should thus be useful to make sure that your python environment is set up correctly and that all the code can be ran without any issues.

## Pre-processing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire pre-processing section, including the figure, requires updating.


There are two main steps in the pre-processing pipeline: creating the graph and creating additional features/normalisation/boundary-masks.

The amount of pre-processing required will depend on what kind of datastore you will be using for training.

### Additional inputs

#### MultiZarr Datastore

* `python -m neural_lam.create_boundary_mask`
* `python -m neural_lam.create_datetime_forcings`
* `python -m neural_lam.create_norm`

Create remaining static features
To create the remaining static files run `python -m neural_lam.create_grid_features` and `python -m neural_lam.create_parameter_weights`.
#### NpyFiles Datastore

## Graph creation
#### MDP (mllam-data-prep) Datastore

Once you have your datastore set up and run any pre-processing steps that your datastore requires the next step is to create the graph structure that the model will use.
This is done with the `neural_lam.create_graph` CLI. The CLI has a number of options that can be used to create different graph structures, including hierarchical graphs and multiscale graphs.
An overview of how the different pre-processing steps, training and files depend on each other is given in this figure:
<p align="middle">
<img src="figures/component_dependencies.png"/>
</p>
In order to start training models at least three pre-processing steps have to be run:

Run `python -m neural_lam.create_graph <neural-lam-config-path> --graph <name>` with suitable options to generate the graph you want to use (see `python neural_lam.create_graph --help` for a list of options) to create a graph named `<name>`.
### Create graph
Run `python -m neural_lam.create_mesh` with suitable options to generate the graph you want to use (see `python neural_lam.create_mesh --help` for a list of options).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the rename of create_mesh.py to create_graph.py this should be updated.
Also, python neural_lam.create_mesh --help is missing an -m

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the whole README needs a big reworking!

The graphs used for the different models in the [paper](https://arxiv.org/abs/2309.17370) can be created as:

* **GC-LAM**: `python -m neural_lam.create_graph <neural-lam-config-path> --graph multiscale`
* **Hi-LAM**: `python -m neural_lam.create_graph <neural-lam-config-path> --graph hierarchical --hierarchical` (also works for Hi-LAM-Parallel)
* **L1-LAM**: `python -m neural_lam.create_graph <neural-lam-config-path> --graph 1level --levels 1`

### Format of graph directory
The `graphs` directory contains generated graph structures that can be used by different graph-based models.
The structure is shown with examples below:
```
graphs
├── graph1 - Directory with a graph definition for "graph1"
│ ├── m2m_edge_index.pt - Edges in mesh graph (neural_lam.create_graph)
│ ├── g2m_edge_index.pt - Edges from grid to mesh (neural_lam.create_graph)
│ ├── m2g_edge_index.pt - Edges from mesh to grid (neural_lam.create_graph)
│ ├── m2m_features.pt - Static features of mesh edges (neural_lam.create_graph)
│ ├── g2m_features.pt - Static features of grid to mesh edges (neural_lam.create_graph)
│ ├── m2g_features.pt - Static features of mesh to grid edges (neural_lam.create_graph)
│ └── mesh_features.pt - Static features of mesh nodes (neural_lam.create_graph)
├── graph2
├── ...
└── graphN
```
* **GC-LAM**: `python -m neural_lam.create_mesh --graph multiscale`
* **Hi-LAM**: `python -m neural_lam.create_mesh --graph hierarchical --hierarchical` (also works for Hi-LAM-Parallel)
* **L1-LAM**: `python -m neural_lam.create_mesh --graph 1level --levels 1`

#### Mesh hierarchy format
To keep track of levels in the mesh graph, a list format is used for the files with mesh graph information.
In particular, the files
```
│ ├── m2m_edge_index.pt - Edges in mesh graph (neural_lam.create_graph)
│ ├── m2m_features.pt - Static features of mesh edges (neural_lam.create_graph)
│ ├── mesh_features.pt - Static features of mesh nodes (neural_lam.create_graph)
```
all contain lists of length `L`, for a hierarchical mesh graph with `L` layers.
For non-hierarchical graphs `L == 1` and these are all just singly-entry lists.
Each entry in the list contains the corresponding edge set or features of that level.
Note that the first level (index 0 in these lists) corresponds to the lowest level in the hierarchy.
The graph-related files are stored in a directory called `graphs`.

In addition, hierarchical mesh graphs (`L > 1`) feature a few additional files with static data:
```
├── graph1
│ ├── ...
│ ├── mesh_down_edge_index.pt - Downward edges in mesh graph (neural_lam.create_graph)
│ ├── mesh_up_edge_index.pt - Upward edges in mesh graph (neural_lam.create_graph)
│ ├── mesh_down_features.pt - Static features of downward mesh edges (neural_lam.create_graph)
│ ├── mesh_up_features.pt - Static features of upward mesh edges (neural_lam.create_graph)
│ ├── ...
```
These files have the same list format as the ones above, but each list has length `L-1` (as these edges describe connections between levels).
Entries 0 in these lists describe edges between the lowest levels 1 and 2.
### Create remaining static features
To create the remaining static files run `python -m neural_lam.create_grid_features` and `python -m neural_lam.create_parameter_weights`.

## Weights & Biases Integration
The project is fully integrated with [Weights & Biases](https://www.wandb.ai/) (W&B) for logging and visualization, but can just as easily be used without it.
Expand All @@ -315,13 +338,15 @@ wandb off
```

## Train Models
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arguments for train_model have changed now, should be updated

Models can be trained using `python -m neural_lam.train_model <config-path>` cli.
Models can be trained using `python -m neural_lam.train_model <datastore_type> <datastore_config_path>`.
Run `python neural_lam.train_model --help` for a full list of training options.
A few of the key ones are outlined below:

* `<config-path>`: the path to the neural-lam config
* `<datastore_type>`: The kind of datastore that you are using (should be one of `npyfiles`, `multizarr` or `mllam`)
* `<datastore_config_path>`: Path to the data store configuration file
* `--model`: Which model to train
* `--graph`: Which graph to use with the model
* `--epochs`: Number of epochs to train for
* `--processor_layers`: Number of GNN layers to use in the processing part of the model
* `--ar_steps`: Number of time steps to unroll for when making predictions and computing the loss

Expand Down Expand Up @@ -378,6 +403,50 @@ Except for training and pre-processing scripts all the source code can be found
Model classes, including abstract base classes, are located in `neural_lam/models`.
Notebooks for visualization and analysis are located in `docs`.

## Format of graph directory
The `graphs` directory contains generated graph structures that can be used by different graph-based models.
The structure is shown with examples below:
```
graphs
├── graph1 - Directory with a graph definition
│ ├── m2m_edge_index.pt - Edges in mesh graph (neural_lam.create_mesh)
│ ├── g2m_edge_index.pt - Edges from grid to mesh (neural_lam.create_mesh)
│ ├── m2g_edge_index.pt - Edges from mesh to grid (neural_lam.create_mesh)
│ ├── m2m_features.pt - Static features of mesh edges (neural_lam.create_mesh)
│ ├── g2m_features.pt - Static features of grid to mesh edges (neural_lam.create_mesh)
│ ├── m2g_features.pt - Static features of mesh to grid edges (neural_lam.create_mesh)
│ └── mesh_features.pt - Static features of mesh nodes (neural_lam.create_mesh)
├── graph2
├── ...
└── graphN
```

### Mesh hierarchy format
To keep track of levels in the mesh graph, a list format is used for the files with mesh graph information.
In particular, the files
```
│ ├── m2m_edge_index.pt - Edges in mesh graph (neural_lam.create_mesh)
│ ├── m2m_features.pt - Static features of mesh edges (neural_lam.create_mesh)
│ ├── mesh_features.pt - Static features of mesh nodes (neural_lam.create_mesh)
```
all contain lists of length `L`, for a hierarchical mesh graph with `L` layers.
For non-hierarchical graphs `L == 1` and these are all just singly-entry lists.
Each entry in the list contains the corresponding edge set or features of that level.
Note that the first level (index 0 in these lists) corresponds to the lowest level in the hierarchy.

In addition, hierarchical mesh graphs (`L > 1`) feature a few additional files with static data:
```
├── graph1
│ ├── ...
│ ├── mesh_down_edge_index.pt - Downward edges in mesh graph (neural_lam.create_mesh)
│ ├── mesh_up_edge_index.pt - Upward edges in mesh graph (neural_lam.create_mesh)
│ ├── mesh_down_features.pt - Static features of downward mesh edges (neural_lam.create_mesh)
│ ├── mesh_up_features.pt - Static features of upward mesh edges (neural_lam.create_mesh)
│ ├── ...
```
These files have the same list format as the ones above, but each list has length `L-1` (as these edges describe connections between levels).
Entries 0 in these lists describe edges between the lowest levels 1 and 2.

# Development and Contributing
Any push or Pull-Request to the main branch will trigger a selection of pre-commit hooks.
These hooks will run a series of checks on the code, like formatting and linting.
Expand Down
Binary file removed figures/component_dependencies.png
Binary file not shown.
18 changes: 0 additions & 18 deletions neural_lam/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,22 +102,4 @@ def load_config_and_datastore(
datastore_kind=config.datastore.kind, config_path=datastore_config_path
)

# TODO: This check should maybe be moved somewhere else, but I'm not sure
# where right now... check that the config state feature weights include a
# weight for each state feature
state_feature_names = datastore.get_vars_names(category="state")
named_feature_weights = config.training.state_feature_weights.keys()

if set(named_feature_weights) != set(state_feature_names):
additional_features = set(named_feature_weights) - set(
state_feature_names
)
missing_features = set(state_feature_names) - set(named_feature_weights)
raise ValueError(
f"State feature weights must be provided for each state feature in "
f"the datastore ({state_feature_names}). {missing_features} are "
"missing and weights are defined for the features "
f"{additional_features} which are not in the datastore."
)

return config, datastore
28 changes: 16 additions & 12 deletions neural_lam/create_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,28 +109,28 @@ def from_networkx_with_start_index(nx_graph, start_index):


def mk_2d_graph(xy, nx, ny):
xm, xM = np.amin(xy[0][0, :]), np.amax(xy[0][0, :])
ym, yM = np.amin(xy[1][:, 0]), np.amax(xy[1][:, 0])
xm, xM = np.amin(xy[:, :, 0][:, 0]), np.amax(xy[:, :, 0][:, 0])
ym, yM = np.amin(xy[:, :, 1][0, :]), np.amax(xy[:, :, 1][0, :])

# avoid nodes on border
dx = (xM - xm) / nx
dy = (yM - ym) / ny
lx = np.linspace(xm + dx / 2, xM - dx / 2, nx)
ly = np.linspace(ym + dy / 2, yM - dy / 2, ny)

mg = np.meshgrid(lx, ly)
g = networkx.grid_2d_graph(len(ly), len(lx))
mg = np.meshgrid(lx, ly, indexing="ij") # Use 'ij' indexing for (Nx,Ny)
g = networkx.grid_2d_graph(len(lx), len(ly))

for node in g.nodes:
g.nodes[node]["pos"] = np.array([mg[0][node], mg[1][node]])

# add diagonal edges
g.add_edges_from(
[((x, y), (x + 1, y + 1)) for x in range(nx - 1) for y in range(ny - 1)]
[((x, y), (x + 1, y + 1)) for y in range(ny - 1) for x in range(nx - 1)]
+ [
((x + 1, y), (x, y + 1))
for x in range(nx - 1)
for y in range(ny - 1)
for x in range(nx - 1)
]
)

Expand Down Expand Up @@ -213,7 +213,7 @@ def create_graph(
graph_dir_path : str
Path to store the graph components.
xy : np.ndarray
Grid coordinates, expected to be of shape (2, Ny, Nx).
Grid coordinates, expected to be of shape (Nx, Ny, 2).
n_max_levels : int
Limit multi-scale mesh to given number of levels, from bottom up
(default: None (no limit)).
Expand All @@ -239,8 +239,8 @@ def create_graph(
#

# graph geometry
nx = 3 # number of children = nx**2
nlev = int(np.log(max(xy.shape)) / np.log(nx))
nx = 3 # number of children =nx**2
nlev = int(np.log(max(xy.shape[:2])) / np.log(nx))
nleaf = nx**nlev # leaves at the bottom = nleaf**2

mesh_levels = nlev - 1
Expand Down Expand Up @@ -432,15 +432,17 @@ def create_graph(
)

# grid nodes
Ny, Nx = xy.shape[1:]
Nx, Ny = xy.shape[:2]

G_grid = networkx.grid_2d_graph(Ny, Nx)
G_grid.clear_edges()

# vg features (only pos introduced here)
for node in G_grid.nodes:
# pos is in feature but here explicit for convenience
G_grid.nodes[node]["pos"] = np.array([xy[0][node], xy[1][node]])
G_grid.nodes[node]["pos"] = xy[
node[1], node[0]
] # xy is already (Nx,Ny,2)

# add 1000 to node key to separate grid nodes (1000,i,j) from mesh nodes
# (i,j) and impose sorting order such that vm are the first nodes
Expand All @@ -449,7 +451,9 @@ def create_graph(
# build kd tree for grid point pos
# order in vg_list should be same as in vg_xy
vg_list = list(G_grid.nodes)
vg_xy = np.array([[xy[0][node[1:]], xy[1][node[1:]]] for node in vg_list])
vg_xy = np.array(
[xy[node[2], node[1]] for node in vg_list]
) # xy is already (Nx,Ny,2)
kdt_g = scipy.spatial.KDTree(vg_xy)

# now add (all) mesh nodes, include features (pos)
Expand Down
Loading