-
Notifications
You must be signed in to change notification settings - Fork 264
/
_netCDF4.pyx
6740 lines (6048 loc) · 289 KB
/
_netCDF4.pyx
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
"""
Version 1.5.6
-------------
- - -
Introduction
============
netcdf4-python is a Python interface to the netCDF C library.
[netCDF](http://www.unidata.ucar.edu/software/netcdf/) version 4 has many features
not found in earlier versions of the library and is implemented on top of
[HDF5](http://www.hdfgroup.org/HDF5). This module can read and write
files in both the new netCDF 4 and the old netCDF 3 format, and can create
files that are readable by HDF5 clients. The API modelled after
[Scientific.IO.NetCDF](http://dirac.cnrs-orleans.fr/ScientificPython/),
and should be familiar to users of that module.
Most new features of netCDF 4 are implemented, such as multiple
unlimited dimensions, groups and zlib data compression. All the new
numeric data types (such as 64 bit and unsigned integer types) are
implemented. Compound (struct), variable length (vlen) and
enumerated (enum) data types are supported, but not the opaque data type.
Mixtures of compound, vlen and enum data types (such as
compound types containing enums, or vlens containing compound
types) are not supported.
Download
========
- Latest bleeding-edge code from the
[github repository](http://github.com/Unidata/netcdf4-python).
- Latest [releases](https://pypi.python.org/pypi/netCDF4)
(source code and binary installers).
Requires
========
- [numpy array module](http://numpy.scipy.org), version 1.10.0 or later.
- [Cython](http://cython.org), version 0.21 or later.
- [setuptools](https://pypi.python.org/pypi/setuptools), version 18.0 or
later.
- [cftime](https://github.com/Unidata/cftime) for
the time and date handling utility functions (`netCDF4.num2date`,
`netCDF4.date2num` and `netCDF4.date2index`).
- The HDF5 C library version 1.8.4-patch1 or higher (1.8.x recommended)
from [](ftp://ftp.hdfgroup.org/HDF5/current/src).
***netCDF version 4.4.1 or higher is recommended if using HDF5 1.10.x -
otherwise resulting files may be unreadable by clients using earlier
versions of HDF5. For netCDF < 4.4.1, HDF5 version 1.8.x is recommended.***
Be sure to build with `--enable-hl --enable-shared`.
- [Libcurl](http://curl.haxx.se/libcurl), if you want
[OPeNDAP](http://opendap.org) support.
- [HDF4](http://www.hdfgroup.org/products/hdf4), if you want
to be able to read HDF4 "Scientific Dataset" (SD) files.
- The netCDF-4 C library from the [github releases
page](https://github.com/Unidata/netcdf-c/releases).
Version 4.1.1 or higher is required (4.2 or higher recommended).
Be sure to build with `--enable-netcdf-4 --enable-shared`, and set
`CPPFLAGS="-I $HDF5_DIR/include"` and `LDFLAGS="-L $HDF5_DIR/lib"`,
where `$HDF5_DIR` is the directory where HDF5 was installed.
If you want [OPeNDAP](http://opendap.org) support, add `--enable-dap`.
If you want HDF4 SD support, add `--enable-hdf4` and add
the location of the HDF4 headers and library to `$CPPFLAGS` and `$LDFLAGS`.
- for MPI parallel IO support, an MPI-enabled versions of the netcdf library
is required, as is the [mpi4py](http://mpi4py.scipy.org) python module.
Parallel IO further depends on the existence of MPI-enabled HDF5 or the
[PnetCDF](https://parallel-netcdf.github.io/) library.
Install
=======
- install the requisite python modules and C libraries (see above). It's
easiest if all the C libs are built as shared libraries.
- By default, the utility `nc-config`, installed with netcdf 4.1.2 or higher,
will be run used to determine where all the dependencies live.
- If `nc-config` is not in your default `PATH`, you can set the `NETCDF4_DIR`
environment variable and `setup.py` will look in `$NETCDF4_DIR/bin`.
You can also use the file `setup.cfg` to set the path to `nc-config`, or
enter the paths to the libraries and include files manually. Just edit the `setup.cfg` file
in a text editor and follow the instructions in the comments.
To disable the use of `nc-config`, set the env var `USE_NCCONFIG` to 0.
To disable the use of `setup.cfg`, set `USE_SETUPCFG` to 0.
As a last resort, the library and include paths can be set via environment variables.
If you go this route, set `USE_NCCONFIG` and `USE_SETUPCFG` to 0, and specify
`NETCDF4_LIBDIR`, `NETCDF4_INCDIR`, `HDF5_LIBDIR` and `HDF5_INCDIR`.
Similarly, environment variables
(all capitalized) can be used to set the include and library paths for
`hdf4`, `szip`, `jpeg`, `curl` and `zlib`. If the dependencies are not found
in any of the paths specified by environment variables, then standard locations
(such as `/usr` and `/usr/local`) are searched.
- run `python setup.py build`, then `python setup.py install` (as root if
necessary). `pip install` can be used to install pre-compiled binary wheels from
[pypi](https://pypi.org/project/netCDF4).
- run the tests in the 'test' directory by running `python run_all.py`.
Tutorial
========
1. [Creating/Opening/Closing a netCDF file.](#section1)
2. [Groups in a netCDF file.](#section2)
3. [Dimensions in a netCDF file.](#section3)
4. [Variables in a netCDF file.](#section4)
5. [Attributes in a netCDF file.](#section5)
6. [Writing data to and retrieving data from a netCDF variable.](#section6)
7. [Dealing with time coordinates.](#section7)
8. [Reading data from a multi-file netCDF dataset.](#section8)
9. [Efficient compression of netCDF variables.](#section9)
10. [Beyond homogeneous arrays of a fixed type - compound data types.](#section10)
11. [Variable-length (vlen) data types.](#section11)
12. [Enum data type.](#section12)
13. [Parallel IO.](#section13)
14. [Dealing with strings.](#section14)
15. [In-memory (diskless) Datasets.](#section15)
## <div id='section1'>1) Creating/Opening/Closing a netCDF file.
To create a netCDF file from python, you simply call the `netCDF4.Dataset`
constructor. This is also the method used to open an existing netCDF
file. If the file is open for write access (`mode='w', 'r+'` or `'a'`), you may
write any type of data including new dimensions, groups, variables and
attributes. netCDF files come in five flavors (`NETCDF3_CLASSIC,
NETCDF3_64BIT_OFFSET, NETCDF3_64BIT_DATA, NETCDF4_CLASSIC`, and `NETCDF4`).
`NETCDF3_CLASSIC` was the original netcdf binary format, and was limited
to file sizes less than 2 Gb. `NETCDF3_64BIT_OFFSET` was introduced
in version 3.6.0 of the library, and extended the original binary format
to allow for file sizes greater than 2 Gb.
`NETCDF3_64BIT_DATA` is a new format that requires version 4.4.0 of
the C library - it extends the `NETCDF3_64BIT_OFFSET` binary format to
allow for unsigned/64 bit integer data types and 64-bit dimension sizes.
`NETCDF3_64BIT` is an alias for `NETCDF3_64BIT_OFFSET`.
`NETCDF4_CLASSIC` files use the version 4 disk format (HDF5), but omits features
not found in the version 3 API. They can be read by netCDF 3 clients
only if they have been relinked against the netCDF 4 library. They can
also be read by HDF5 clients. `NETCDF4` files use the version 4 disk
format (HDF5) and use the new features of the version 4 API. The
`netCDF4` module can read and write files in any of these formats. When
creating a new file, the format may be specified using the `format`
keyword in the `Dataset` constructor. The default format is
`NETCDF4`. To see how a given file is formatted, you can examine the
`data_model` attribute. Closing the netCDF file is
accomplished via the `netCDF4.Dataset.close` method of the `netCDF4.Dataset`
instance.
Here's an example:
:::python
>>> from netCDF4 import Dataset
>>> rootgrp = Dataset("test.nc", "w", format="NETCDF4")
>>> print(rootgrp.data_model)
NETCDF4
>>> rootgrp.close()
Remote [OPeNDAP](http://opendap.org)-hosted datasets can be accessed for
reading over http if a URL is provided to the `netCDF4.Dataset` constructor instead of a
filename. However, this requires that the netCDF library be built with
OPenDAP support, via the `--enable-dap` configure option (added in
version 4.0.1).
## <div id='section2'>2) Groups in a netCDF file.
netCDF version 4 added support for organizing data in hierarchical
groups, which are analogous to directories in a filesystem. Groups serve
as containers for variables, dimensions and attributes, as well as other
groups. A `netCDF4.Dataset` creates a special group, called
the 'root group', which is similar to the root directory in a unix
filesystem. To create `netCDF4.Group` instances, use the
`netCDF4.Dataset.createGroup` method of a `netCDF4.Dataset` or `netCDF4.Group`
instance. `netCDF4.Dataset.createGroup` takes a single argument, a
python string containing the name of the new group. The new `netCDF4.Group`
instances contained within the root group can be accessed by name using
the `groups` dictionary attribute of the `netCDF4.Dataset` instance. Only
`NETCDF4` formatted files support Groups, if you try to create a Group
in a netCDF 3 file you will get an error message.
:::python
>>> rootgrp = Dataset("test.nc", "a")
>>> fcstgrp = rootgrp.createGroup("forecasts")
>>> analgrp = rootgrp.createGroup("analyses")
>>> print(rootgrp.groups)
{'forecasts': <class 'netCDF4._netCDF4.Group'>
group /forecasts:
dimensions(sizes):
variables(dimensions):
groups: , 'analyses': <class 'netCDF4._netCDF4.Group'>
group /analyses:
dimensions(sizes):
variables(dimensions):
groups: }
Groups can exist within groups in a `netCDF4.Dataset`, just as directories
exist within directories in a unix filesystem. Each `netCDF4.Group` instance
has a `groups` attribute dictionary containing all of the group
instances contained within that group. Each `netCDF4.Group` instance also has a
`path` attribute that contains a simulated unix directory path to
that group. To simplify the creation of nested groups, you can
use a unix-like path as an argument to `netCDF4.Dataset.createGroup`.
:::python
>>> fcstgrp1 = rootgrp.createGroup("/forecasts/model1")
>>> fcstgrp2 = rootgrp.createGroup("/forecasts/model2")
If any of the intermediate elements of the path do not exist, they are created,
just as with the unix command `'mkdir -p'`. If you try to create a group
that already exists, no error will be raised, and the existing group will be
returned.
Here's an example that shows how to navigate all the groups in a
`netCDF4.Dataset`. The function `walktree` is a Python generator that is used
to walk the directory tree. Note that printing the `netCDF4.Dataset` or `netCDF4.Group`
object yields summary information about it's contents.
:::python
>>> def walktree(top):
... values = top.groups.values()
... yield values
... for value in top.groups.values():
... for children in walktree(value):
... yield children
>>> print(rootgrp)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
dimensions(sizes):
variables(dimensions):
groups: forecasts, analyses
>>> for children in walktree(rootgrp):
... for child in children:
... print(child)
<class 'netCDF4._netCDF4.Group'>
group /forecasts:
dimensions(sizes):
variables(dimensions):
groups: model1, model2
<class 'netCDF4._netCDF4.Group'>
group /analyses:
dimensions(sizes):
variables(dimensions):
groups:
<class 'netCDF4._netCDF4.Group'>
group /forecasts/model1:
dimensions(sizes):
variables(dimensions):
groups:
<class 'netCDF4._netCDF4.Group'>
group /forecasts/model2:
dimensions(sizes):
variables(dimensions):
groups:
## <div id='section3'>3) Dimensions in a netCDF file.
netCDF defines the sizes of all variables in terms of dimensions, so
before any variables can be created the dimensions they use must be
created first. A special case, not often used in practice, is that of a
scalar variable, which has no dimensions. A dimension is created using
the `netCDF4.Dataset.createDimension` method of a `netCDF4.Dataset`
or `netCDF4.Group` instance. A Python string is used to set the name of the
dimension, and an integer value is used to set the size. To create an
unlimited dimension (a dimension that can be appended to), the size
value is set to `None` or 0. In this example, there both the `time` and
`level` dimensions are unlimited. Having more than one unlimited
dimension is a new netCDF 4 feature, in netCDF 3 files there may be only
one, and it must be the first (leftmost) dimension of the variable.
:::python
>>> level = rootgrp.createDimension("level", None)
>>> time = rootgrp.createDimension("time", None)
>>> lat = rootgrp.createDimension("lat", 73)
>>> lon = rootgrp.createDimension("lon", 144)
All of the `netCDF4.Dimension` instances are stored in a python dictionary.
:::python
>>> print(rootgrp.dimensions)
{'level': <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0, 'time': <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0, 'lat': <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 73, 'lon': <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 144}
Calling the python `len` function with a `netCDF4.Dimension` instance returns
the current size of that dimension.
The `netCDF4.Dimension.isunlimited` method of a `netCDF4.Dimension` instance
can be used to determine if the dimensions is unlimited, or appendable.
:::python
>>> print(len(lon))
144
>>> print(lon.isunlimited())
False
>>> print(time.isunlimited())
True
Printing the `netCDF4.Dimension` object
provides useful summary info, including the name and length of the dimension,
and whether it is unlimited.
:::python
>>> for dimobj in rootgrp.dimensions.values():
... print(dimobj)
<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0
<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0
<class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 73
<class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 144
`netCDF4.Dimension` names can be changed using the
`netCDF4.Datatset.renameDimension` method of a `netCDF4.Dataset` or
`netCDF4.Group` instance.
## <div id='section4'>4) Variables in a netCDF file.
netCDF variables behave much like python multidimensional array objects
supplied by the [numpy module](http://numpy.scipy.org). However,
unlike numpy arrays, netCDF4 variables can be appended to along one or
more 'unlimited' dimensions. To create a netCDF variable, use the
`netCDF4.Dataset.createVariable` method of a `netCDF4.Dataset` or
`netCDF4.Group` instance. The `netCDF4.Dataset.createVariable` method
has two mandatory arguments, the variable name (a Python string), and
the variable datatype. The variable's dimensions are given by a tuple
containing the dimension names (defined previously with
`netCDF4.Dataset.createDimension`). To create a scalar
variable, simply leave out the dimensions keyword. The variable
primitive datatypes correspond to the dtype attribute of a numpy array.
You can specify the datatype as a numpy dtype object, or anything that
can be converted to a numpy dtype object. Valid datatype specifiers
include: `'f4'` (32-bit floating point), `'f8'` (64-bit floating
point), `'i4'` (32-bit signed integer), `'i2'` (16-bit signed
integer), `'i8'` (64-bit signed integer), `'i1'` (8-bit signed
integer), `'u1'` (8-bit unsigned integer), `'u2'` (16-bit unsigned
integer), `'u4'` (32-bit unsigned integer), `'u8'` (64-bit unsigned
integer), or `'S1'` (single-character string). The old Numeric
single-character typecodes (`'f'`,`'d'`,`'h'`,
`'s'`,`'b'`,`'B'`,`'c'`,`'i'`,`'l'`), corresponding to
(`'f4'`,`'f8'`,`'i2'`,`'i2'`,`'i1'`,`'i1'`,`'S1'`,`'i4'`,`'i4'`),
will also work. The unsigned integer types and the 64-bit integer type
can only be used if the file format is `NETCDF4`.
The dimensions themselves are usually also defined as variables, called
coordinate variables. The `netCDF4.Dataset.createVariable`
method returns an instance of the `netCDF4.Variable` class whose methods can be
used later to access and set variable data and attributes.
:::python
>>> times = rootgrp.createVariable("time","f8",("time",))
>>> levels = rootgrp.createVariable("level","i4",("level",))
>>> latitudes = rootgrp.createVariable("lat","f4",("lat",))
>>> longitudes = rootgrp.createVariable("lon","f4",("lon",))
>>> # two dimensions unlimited
>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",))
>>> temp.units = "K"
To get summary info on a `netCDF4.Variable` instance in an interactive session,
just print it.
:::python
>>> print(temp)
<class 'netCDF4._netCDF4.Variable'>
float32 temp(time, level, lat, lon)
units: K
unlimited dimensions: time, level
current shape = (0, 0, 73, 144)
filling on, default _FillValue of 9.969209968386869e+36 used
You can use a path to create a Variable inside a hierarchy of groups.
:::python
>>> ftemp = rootgrp.createVariable("/forecasts/model1/temp","f4",("time","level","lat","lon",))
If the intermediate groups do not yet exist, they will be created.
You can also query a `netCDF4.Dataset` or `netCDF4.Group` instance directly to obtain `netCDF4.Group` or
`netCDF4.Variable` instances using paths.
:::python
>>> print(rootgrp["/forecasts/model1"]) # a Group instance
<class 'netCDF4._netCDF4.Group'>
group /forecasts/model1:
dimensions(sizes):
variables(dimensions): float32 temp(time,level,lat,lon)
groups:
>>> print(rootgrp["/forecasts/model1/temp"]) # a Variable instance
<class 'netCDF4._netCDF4.Variable'>
float32 temp(time, level, lat, lon)
path = /forecasts/model1
unlimited dimensions: time, level
current shape = (0, 0, 73, 144)
filling on, default _FillValue of 9.969209968386869e+36 used
All of the variables in the `netCDF4.Dataset` or `netCDF4.Group` are stored in a
Python dictionary, in the same way as the dimensions:
:::python
>>> print(rootgrp.variables)
{'time': <class 'netCDF4._netCDF4.Variable'>
float64 time(time)
unlimited dimensions: time
current shape = (0,)
filling on, default _FillValue of 9.969209968386869e+36 used, 'level': <class 'netCDF4._netCDF4.Variable'>
int32 level(level)
unlimited dimensions: level
current shape = (0,)
filling on, default _FillValue of -2147483647 used, 'lat': <class 'netCDF4._netCDF4.Variable'>
float32 lat(lat)
unlimited dimensions:
current shape = (73,)
filling on, default _FillValue of 9.969209968386869e+36 used, 'lon': <class 'netCDF4._netCDF4.Variable'>
float32 lon(lon)
unlimited dimensions:
current shape = (144,)
filling on, default _FillValue of 9.969209968386869e+36 used, 'temp': <class 'netCDF4._netCDF4.Variable'>
float32 temp(time, level, lat, lon)
units: K
unlimited dimensions: time, level
current shape = (0, 0, 73, 144)
filling on, default _FillValue of 9.969209968386869e+36 used}
`netCDF4.Variable` names can be changed using the
`netCDF4.Dataset.renameVariable` method of a `netCDF4.Dataset`
instance.
## <div id='section5'>5) Attributes in a netCDF file.
There are two types of attributes in a netCDF file, global and variable.
Global attributes provide information about a group, or the entire
dataset, as a whole. `netCDF4.Variable` attributes provide information about
one of the variables in a group. Global attributes are set by assigning
values to `netCDF4.Dataset` or `netCDF4.Group` instance variables. `netCDF4.Variable`
attributes are set by assigning values to `netCDF4.Variable` instances
variables. Attributes can be strings, numbers or sequences. Returning to
our example,
:::python
>>> import time
>>> rootgrp.description = "bogus example script"
>>> rootgrp.history = "Created " + time.ctime(time.time())
>>> rootgrp.source = "netCDF4 python module tutorial"
>>> latitudes.units = "degrees north"
>>> longitudes.units = "degrees east"
>>> levels.units = "hPa"
>>> temp.units = "K"
>>> times.units = "hours since 0001-01-01 00:00:00.0"
>>> times.calendar = "gregorian"
The `netCDF4.Dataset.ncattrs` method of a `netCDF4.Dataset`, `netCDF4.Group` or
`netCDF4.Variable` instance can be used to retrieve the names of all the netCDF
attributes. This method is provided as a convenience, since using the
built-in `dir` Python function will return a bunch of private methods
and attributes that cannot (or should not) be modified by the user.
:::python
>>> for name in rootgrp.ncattrs():
... print("Global attr {} = {}".format(name, getattr(rootgrp, name)))
Global attr description = bogus example script
Global attr history = Created Mon Jul 8 14:19:41 2019
Global attr source = netCDF4 python module tutorial
The `__dict__` attribute of a `netCDF4.Dataset`, `netCDF4.Group` or `netCDF4.Variable`
instance provides all the netCDF attribute name/value pairs in a python
dictionary:
:::python
>>> print(rootgrp.__dict__)
{'description': 'bogus example script', 'history': 'Created Mon Jul 8 14:19:41 2019', 'source': 'netCDF4 python module tutorial'}
Attributes can be deleted from a netCDF `netCDF4.Dataset`, `netCDF4.Group` or
`netCDF4.Variable` using the python `del` statement (i.e. `del grp.foo`
removes the attribute `foo` the the group `grp`).
## <div id='section6'>6) Writing data to and retrieving data from a netCDF variable.
Now that you have a netCDF `netCDF4.Variable` instance, how do you put data
into it? You can just treat it like an array and assign data to a slice.
:::python
>>> import numpy
>>> lats = numpy.arange(-90,91,2.5)
>>> lons = numpy.arange(-180,180,2.5)
>>> latitudes[:] = lats
>>> longitudes[:] = lons
>>> print("latitudes =\\n{}".format(latitudes[:]))
latitudes =
[-90. -87.5 -85. -82.5 -80. -77.5 -75. -72.5 -70. -67.5 -65. -62.5
-60. -57.5 -55. -52.5 -50. -47.5 -45. -42.5 -40. -37.5 -35. -32.5
-30. -27.5 -25. -22.5 -20. -17.5 -15. -12.5 -10. -7.5 -5. -2.5
0. 2.5 5. 7.5 10. 12.5 15. 17.5 20. 22.5 25. 27.5
30. 32.5 35. 37.5 40. 42.5 45. 47.5 50. 52.5 55. 57.5
60. 62.5 65. 67.5 70. 72.5 75. 77.5 80. 82.5 85. 87.5
90. ]
Unlike NumPy's array objects, netCDF `netCDF4.Variable`
objects with unlimited dimensions will grow along those dimensions if you
assign data outside the currently defined range of indices.
:::python
>>> # append along two unlimited dimensions by assigning to slice.
>>> nlats = len(rootgrp.dimensions["lat"])
>>> nlons = len(rootgrp.dimensions["lon"])
>>> print("temp shape before adding data = {}".format(temp.shape))
temp shape before adding data = (0, 0, 73, 144)
>>>
>>> from numpy.random import uniform
>>> temp[0:5, 0:10, :, :] = uniform(size=(5, 10, nlats, nlons))
>>> print("temp shape after adding data = {}".format(temp.shape))
temp shape after adding data = (5, 10, 73, 144)
>>>
>>> # levels have grown, but no values yet assigned.
>>> print("levels shape after adding pressure data = {}".format(levels.shape))
levels shape after adding pressure data = (10,)
Note that the size of the levels variable grows when data is appended
along the `level` dimension of the variable `temp`, even though no
data has yet been assigned to levels.
:::python
>>> # now, assign data to levels dimension variable.
>>> levels[:] = [1000.,850.,700.,500.,300.,250.,200.,150.,100.,50.]
However, that there are some differences between NumPy and netCDF
variable slicing rules. Slices behave as usual, being specified as a
`start:stop:step` triplet. Using a scalar integer index `i` takes the ith
element and reduces the rank of the output array by one. Boolean array and
integer sequence indexing behaves differently for netCDF variables
than for numpy arrays. Only 1-d boolean arrays and integer sequences are
allowed, and these indices work independently along each dimension (similar
to the way vector subscripts work in fortran). This means that
:::python
>>> temp[0, 0, [0,1,2,3], [0,1,2,3]].shape
(4, 4)
returns an array of shape (4,4) when slicing a netCDF variable, but for a
numpy array it returns an array of shape (4,).
Similarly, a netCDF variable of shape `(2,3,4,5)` indexed
with `[0, array([True, False, True]), array([False, True, True, True]), :]`
would return a `(2, 3, 5)` array. In NumPy, this would raise an error since
it would be equivalent to `[0, [0,1], [1,2,3], :]`. When slicing with integer
sequences, the indices ***need not be sorted*** and ***may contain
duplicates*** (both of these are new features in version 1.2.1).
While this behaviour may cause some confusion for those used to NumPy's 'fancy indexing' rules,
it provides a very powerful way to extract data from multidimensional netCDF
variables by using logical operations on the dimension arrays to create slices.
For example,
:::python
>>> tempdat = temp[::2, [1,3,6], lats>0, lons>0]
will extract time indices 0,2 and 4, pressure levels
850, 500 and 200 hPa, all Northern Hemisphere latitudes and Eastern
Hemisphere longitudes, resulting in a numpy array of shape (3, 3, 36, 71).
:::python
>>> print("shape of fancy temp slice = {}".format(tempdat.shape))
shape of fancy temp slice = (3, 3, 36, 71)
***Special note for scalar variables***: To extract data from a scalar variable
`v` with no associated dimensions, use `numpy.asarray(v)` or `v[...]`.
The result will be a numpy scalar array.
By default, netcdf4-python returns numpy masked arrays with values equal to the
`missing_value` or `_FillValue` variable attributes masked. The
`netCDF4.Dataset.set_auto_mask` `netCDF4.Dataset` and `netCDF4.Variable` methods
can be used to disable this feature so that
numpy arrays are always returned, with the missing values included. Prior to
version 1.4.0 the default behavior was to only return masked arrays when the
requested slice contained missing values. This behavior can be recovered
using the `netCDF4.Dataset.set_always_mask` method. If a masked array is
written to a netCDF variable, the masked elements are filled with the
value specified by the `missing_value` attribute. If the variable has
no `missing_value`, the `_FillValue` is used instead.
## <div id='section7'>7) Dealing with time coordinates.
Time coordinate values pose a special challenge to netCDF users. Most
metadata standards (such as CF) specify that time should be
measure relative to a fixed date using a certain calendar, with units
specified like `hours since YY-MM-DD hh:mm:ss`. These units can be
awkward to deal with, without a utility to convert the values to and
from calendar dates. The function called `netCDF4.num2date` and `netCDF4.date2num` are
provided with this package to do just that (starting with version 1.4.0, the
[cftime](https://unidata.github.io/cftime) package must be installed
separately). Here's an example of how they
can be used:
:::python
>>> # fill in times.
>>> from datetime import datetime, timedelta
>>> from netCDF4 import num2date, date2num
>>> dates = [datetime(2001,3,1)+n*timedelta(hours=12) for n in range(temp.shape[0])]
>>> times[:] = date2num(dates,units=times.units,calendar=times.calendar)
>>> print("time values (in units {}):\\n{}".format(times.units, times[:]))
time values (in units hours since 0001-01-01 00:00:00.0):
[17533104. 17533116. 17533128. 17533140. 17533152.]
>>> dates = num2date(times[:],units=times.units,calendar=times.calendar)
>>> print("dates corresponding to time values:\\n{}".format(dates))
dates corresponding to time values:
[real_datetime(2001, 3, 1, 0, 0) real_datetime(2001, 3, 1, 12, 0)
real_datetime(2001, 3, 2, 0, 0) real_datetime(2001, 3, 2, 12, 0)
real_datetime(2001, 3, 3, 0, 0)]
`netCDF4.num2date` converts numeric values of time in the specified `units`
and `calendar` to datetime objects, and `netCDF4.date2num` does the reverse.
All the calendars currently defined in the
[CF metadata convention](http://cfconventions.org) are supported.
A function called `netCDF4.date2index` is also provided which returns the indices
of a netCDF time variable corresponding to a sequence of datetime instances.
## <div id='section8'>8) Reading data from a multi-file netCDF dataset.
If you want to read data from a variable that spans multiple netCDF files,
you can use the `netCDF4.MFDataset` class to read the data as if it were
contained in a single file. Instead of using a single filename to create
a `netCDF4.Dataset` instance, create a `netCDF4.MFDataset` instance with either a list
of filenames, or a string with a wildcard (which is then converted to
a sorted list of files using the python glob module).
Variables in the list of files that share the same unlimited
dimension are aggregated together, and can be sliced across multiple
files. To illustrate this, let's first create a bunch of netCDF files with
the same variable (with the same unlimited dimension). The files
must in be in `NETCDF3_64BIT_OFFSET`, `NETCDF3_64BIT_DATA`, `NETCDF3_CLASSIC` or
`NETCDF4_CLASSIC` format (`NETCDF4` formatted multi-file
datasets are not supported).
:::python
>>> for nf in range(10):
... with Dataset("mftest%s.nc" % nf, "w", format="NETCDF4_CLASSIC") as f:
... _ = f.createDimension("x",None)
... x = f.createVariable("x","i",("x",))
... x[0:10] = numpy.arange(nf*10,10*(nf+1))
Now read all the files back in at once with `netCDF4.MFDataset`
:::python
>>> from netCDF4 import MFDataset
>>> f = MFDataset("mftest*nc")
>>> print(f.variables["x"][:])
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
96 97 98 99]
Note that `netCDF4.MFDataset` can only be used to read, not write, multi-file
datasets.
## <div id='section9'>9) Efficient compression of netCDF variables.
Data stored in netCDF 4 `netCDF4.Variable` objects can be compressed and
decompressed on the fly. The parameters for the compression are
determined by the `zlib`, `complevel` and `shuffle` keyword arguments
to the `netCDF4.Dataset.createVariable` method. To turn on
compression, set `zlib=True`. The `complevel` keyword regulates the
speed and efficiency of the compression (1 being fastest, but lowest
compression ratio, 9 being slowest but best compression ratio). The
default value of `complevel` is 4. Setting `shuffle=False` will turn
off the HDF5 shuffle filter, which de-interlaces a block of data before
compression by reordering the bytes. The shuffle filter can
significantly improve compression ratios, and is on by default. Setting
`fletcher32` keyword argument to
`netCDF4.Dataset.createVariable` to `True` (it's `False` by
default) enables the Fletcher32 checksum algorithm for error detection.
It's also possible to set the HDF5 chunking parameters and endian-ness
of the binary data stored in the HDF5 file with the `chunksizes`
and `endian` keyword arguments to
`netCDF4.Dataset.createVariable`. These keyword arguments only
are relevant for `NETCDF4` and `NETCDF4_CLASSIC` files (where the
underlying file format is HDF5) and are silently ignored if the file
format is `NETCDF3_CLASSIC`, `NETCDF3_64BIT_OFFSET` or `NETCDF3_64BIT_DATA`.
If your data only has a certain number of digits of precision (say for
example, it is temperature data that was measured with a precision of
0.1 degrees), you can dramatically improve zlib compression by
quantizing (or truncating) the data using the `least_significant_digit`
keyword argument to `netCDF4.Dataset.createVariable`. The least
significant digit is the power of ten of the smallest decimal place in
the data that is a reliable value. For example if the data has a
precision of 0.1, then setting `least_significant_digit=1` will cause
data the data to be quantized using `numpy.around(scale*data)/scale`, where
scale = 2**bits, and bits is determined so that a precision of 0.1 is
retained (in this case bits=4). Effectively, this makes the compression
'lossy' instead of 'lossless', that is some precision in the data is
sacrificed for the sake of disk space.
In our example, try replacing the line
:::python
>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",))
with
:::python
>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),zlib=True)
and then
:::python
>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),zlib=True,least_significant_digit=3)
and see how much smaller the resulting files are.
## <div id='section10'>10) Beyond homogeneous arrays of a fixed type - compound data types.
Compound data types map directly to numpy structured (a.k.a 'record')
arrays. Structured arrays are akin to C structs, or derived types
in Fortran. They allow for the construction of table-like structures
composed of combinations of other data types, including other
compound types. Compound types might be useful for representing multiple
parameter values at each point on a grid, or at each time and space
location for scattered (point) data. You can then access all the
information for a point by reading one variable, instead of reading
different parameters from different variables. Compound data types
are created from the corresponding numpy data type using the
`netCDF4.Dataset.createCompoundType` method of a `netCDF4.Dataset` or `netCDF4.Group` instance.
Since there is no native complex data type in netcdf, compound types are handy
for storing numpy complex arrays. Here's an example:
:::python
>>> f = Dataset("complex.nc","w")
>>> size = 3 # length of 1-d complex array
>>> # create sample complex data.
>>> datac = numpy.exp(1j*(1.+numpy.linspace(0, numpy.pi, size)))
>>> # create complex128 compound data type.
>>> complex128 = numpy.dtype([("real",numpy.float64),("imag",numpy.float64)])
>>> complex128_t = f.createCompoundType(complex128,"complex128")
>>> # create a variable with this data type, write some data to it.
>>> x_dim = f.createDimension("x_dim",None)
>>> v = f.createVariable("cmplx_var",complex128_t,"x_dim")
>>> data = numpy.empty(size,complex128) # numpy structured array
>>> data["real"] = datac.real; data["imag"] = datac.imag
>>> v[:] = data # write numpy structured array to netcdf compound var
>>> # close and reopen the file, check the contents.
>>> f.close(); f = Dataset("complex.nc")
>>> v = f.variables["cmplx_var"]
>>> datain = v[:] # read in all the data into a numpy structured array
>>> # create an empty numpy complex array
>>> datac2 = numpy.empty(datain.shape,numpy.complex128)
>>> # .. fill it with contents of structured array.
>>> datac2.real = datain["real"]; datac2.imag = datain["imag"]
>>> print('{}: {}'.format(datac.dtype, datac)) # original data
complex128: [ 0.54030231+0.84147098j -0.84147098+0.54030231j -0.54030231-0.84147098j]
>>>
>>> print('{}: {}'.format(datac2.dtype, datac2)) # data from file
complex128: [ 0.54030231+0.84147098j -0.84147098+0.54030231j -0.54030231-0.84147098j]
Compound types can be nested, but you must create the 'inner'
ones first. All possible numpy structured arrays cannot be
represented as Compound variables - an error message will be
raise if you try to create one that is not supported.
All of the compound types defined for a `netCDF4.Dataset` or `netCDF4.Group` are stored
in a Python dictionary, just like variables and dimensions. As always, printing
objects gives useful summary information in an interactive session:
:::python
>>> print(f)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
dimensions(sizes): x_dim(3)
variables(dimensions): {'names':['real','imag'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':16, 'aligned':True} cmplx_var(x_dim)
groups:
>>> print(f.variables["cmplx_var"])
<class 'netCDF4._netCDF4.Variable'>
compound cmplx_var(x_dim)
compound data type: {'names':['real','imag'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':16, 'aligned':True}
unlimited dimensions: x_dim
current shape = (3,)
>>> print(f.cmptypes)
{'complex128': <class 'netCDF4._netCDF4.CompoundType'>: name = 'complex128', numpy dtype = {'names':['real','imag'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':16, 'aligned':True}}
>>> print(f.cmptypes["complex128"])
<class 'netCDF4._netCDF4.CompoundType'>: name = 'complex128', numpy dtype = {'names':['real','imag'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':16, 'aligned':True}
## <div id='section11'>11) Variable-length (vlen) data types.
NetCDF 4 has support for variable-length or "ragged" arrays. These are arrays
of variable length sequences having the same type. To create a variable-length
data type, use the `netCDF4.Dataset.createVLType` method
method of a `netCDF4.Dataset` or `netCDF4.Group` instance.
:::python
>>> f = Dataset("tst_vlen.nc","w")
>>> vlen_t = f.createVLType(numpy.int32, "phony_vlen")
The numpy datatype of the variable-length sequences and the name of the
new datatype must be specified. Any of the primitive datatypes can be
used (signed and unsigned integers, 32 and 64 bit floats, and characters),
but compound data types cannot.
A new variable can then be created using this datatype.
:::python
>>> x = f.createDimension("x",3)
>>> y = f.createDimension("y",4)
>>> vlvar = f.createVariable("phony_vlen_var", vlen_t, ("y","x"))
Since there is no native vlen datatype in numpy, vlen arrays are represented
in python as object arrays (arrays of dtype `object`). These are arrays whose
elements are Python object pointers, and can contain any type of python object.
For this application, they must contain 1-D numpy arrays all of the same type
but of varying length.
In this case, they contain 1-D numpy `int32` arrays of random length between
1 and 10.
:::python
>>> import random
>>> random.seed(54321)
>>> data = numpy.empty(len(y)*len(x),object)
>>> for n in range(len(y)*len(x)):
... data[n] = numpy.arange(random.randint(1,10),dtype="int32")+1
>>> data = numpy.reshape(data,(len(y),len(x)))
>>> vlvar[:] = data
>>> print("vlen variable =\\n{}".format(vlvar[:]))
vlen variable =
[[array([1, 2, 3, 4, 5, 6, 7, 8], dtype=int32) array([1, 2], dtype=int32)
array([1, 2, 3, 4], dtype=int32)]
[array([1, 2, 3], dtype=int32)
array([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
array([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)]
[array([1, 2, 3, 4, 5, 6, 7], dtype=int32) array([1, 2, 3], dtype=int32)
array([1, 2, 3, 4, 5, 6], dtype=int32)]
[array([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
array([1, 2, 3, 4, 5], dtype=int32) array([1, 2], dtype=int32)]]
>>> print(f)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
dimensions(sizes): x(3), y(4)
variables(dimensions): int32 phony_vlen_var(y,x)
groups:
>>> print(f.variables["phony_vlen_var"])
<class 'netCDF4._netCDF4.Variable'>
vlen phony_vlen_var(y, x)
vlen data type: int32
unlimited dimensions:
current shape = (4, 3)
>>> print(f.vltypes["phony_vlen"])
<class 'netCDF4._netCDF4.VLType'>: name = 'phony_vlen', numpy dtype = int32
Numpy object arrays containing python strings can also be written as vlen
variables, For vlen strings, you don't need to create a vlen data type.
Instead, simply use the python `str` builtin (or a numpy string datatype
with fixed length greater than 1) when calling the
`netCDF4.Dataset.createVariable` method.
:::python
>>> z = f.createDimension("z",10)
>>> strvar = f.createVariable("strvar", str, "z")
In this example, an object array is filled with random python strings with
random lengths between 2 and 12 characters, and the data in the object
array is assigned to the vlen string variable.
:::python
>>> chars = "1234567890aabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
>>> data = numpy.empty(10,"O")
>>> for n in range(10):
... stringlen = random.randint(2,12)
... data[n] = "".join([random.choice(chars) for i in range(stringlen)])
>>> strvar[:] = data
>>> print("variable-length string variable:\\n{}".format(strvar[:]))
variable-length string variable:
['Lh' '25F8wBbMI' '53rmM' 'vvjnb3t63ao' 'qjRBQk6w' 'aJh' 'QF'
'jtIJbJACaQk4' '3Z5' 'bftIIq']
>>> print(f)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
dimensions(sizes): x(3), y(4), z(10)
variables(dimensions): int32 phony_vlen_var(y,x), <class 'str'> strvar(z)
groups:
>>> print(f.variables["strvar"])
<class 'netCDF4._netCDF4.Variable'>
vlen strvar(z)
vlen data type: <class 'str'>
unlimited dimensions:
current shape = (10,)
It is also possible to set contents of vlen string variables with numpy arrays
of any string or unicode data type. Note, however, that accessing the contents
of such variables will always return numpy arrays with dtype `object`.
## <div id='section12'>12) Enum data type.
netCDF4 has an enumerated data type, which is an integer datatype that is
restricted to certain named values. Since Enums don't map directly to
a numpy data type, they are read and written as integer arrays.
Here's an example of using an Enum type to hold cloud type data.
The base integer data type and a python dictionary describing the allowed
values and their names are used to define an Enum data type using
`netCDF4.Dataset.createEnumType`.
:::python
>>> nc = Dataset('clouds.nc','w')
>>> # python dict with allowed values and their names.
>>> enum_dict = {'Altocumulus': 7, 'Missing': 255,
... 'Stratus': 2, 'Clear': 0,
... 'Nimbostratus': 6, 'Cumulus': 4, 'Altostratus': 5,
... 'Cumulonimbus': 1, 'Stratocumulus': 3}
>>> # create the Enum type called 'cloud_t'.
>>> cloud_type = nc.createEnumType(numpy.uint8,'cloud_t',enum_dict)
>>> print(cloud_type)
<class 'netCDF4._netCDF4.EnumType'>: name = 'cloud_t', numpy dtype = uint8, fields/values ={'Altocumulus': 7, 'Missing': 255, 'Stratus': 2, 'Clear': 0, 'Nimbostratus': 6, 'Cumulus': 4, 'Altostratus': 5, 'Cumulonimbus': 1, 'Stratocumulus': 3}
A new variable can be created in the usual way using this data type.
Integer data is written to the variable that represents the named
cloud types in enum_dict. A `ValueError` will be raised if an attempt
is made to write an integer value not associated with one of the
specified names.
:::python
>>> time = nc.createDimension('time',None)
>>> # create a 1d variable of type 'cloud_type'.
>>> # The fill_value is set to the 'Missing' named value.
>>> cloud_var = nc.createVariable('primary_cloud',cloud_type,'time',
... fill_value=enum_dict['Missing'])
>>> # write some data to the variable.
>>> cloud_var[:] = [enum_dict[k] for k in ['Clear', 'Stratus', 'Cumulus',
... 'Missing', 'Cumulonimbus']]
>>> nc.close()
>>> # reopen the file, read the data.
>>> nc = Dataset('clouds.nc')
>>> cloud_var = nc.variables['primary_cloud']
>>> print(cloud_var)
<class 'netCDF4._netCDF4.Variable'>
enum primary_cloud(time)
_FillValue: 255
enum data type: uint8
unlimited dimensions: time
current shape = (5,)
>>> print(cloud_var.datatype.enum_dict)
{'Altocumulus': 7, 'Missing': 255, 'Stratus': 2, 'Clear': 0, 'Nimbostratus': 6, 'Cumulus': 4, 'Altostratus': 5, 'Cumulonimbus': 1, 'Stratocumulus': 3}
>>> print(cloud_var[:])
[0 2 4 -- 1]
>>> nc.close()
## <div id='section13'>13) Parallel IO.
If MPI parallel enabled versions of netcdf and hdf5 or pnetcdf are detected,
and [mpi4py](https://mpi4py.scipy.org) is installed, netcdf4-python will
be built with parallel IO capabilities enabled. Parallel IO of NETCDF4 or
NETCDF4_CLASSIC formatted files is only available if the MPI parallel HDF5
library is available. Parallel IO of classic netcdf-3 file formats is only
available if the [PnetCDF](https://parallel-netcdf.github.io/) library is
available. To use parallel IO, your program must be running in an MPI
environment using [mpi4py](https://mpi4py.scipy.org).
:::python
>>> from mpi4py import MPI
>>> import numpy as np
>>> from netCDF4 import Dataset
>>> rank = MPI.COMM_WORLD.rank # The process ID (integer 0-3 for 4-process run)
To run an MPI-based parallel program like this, you must use `mpiexec` to launch several
parallel instances of Python (for example, using `mpiexec -np 4 python mpi_example.py`).
The parallel features of netcdf4-python are mostly transparent -
when a new dataset is created or an existing dataset is opened,
use the `parallel` keyword to enable parallel access.
:::python
>>> nc = Dataset('parallel_test.nc','w',parallel=True)
The optional `comm` keyword may be used to specify a particular
MPI communicator (`MPI_COMM_WORLD` is used by default). Each process (or rank)
can now write to the file indepedently. In this example the process rank is
written to a different variable index on each task
:::python
>>> d = nc.createDimension('dim',4)
>>> v = nc.createVariable('var', np.int, 'dim')
>>> v[rank] = rank
>>> nc.close()
% ncdump parallel_test.nc
netcdf parallel_test {
dimensions:
dim = 4 ;
variables:
int64 var(dim) ;
data:
var = 0, 1, 2, 3 ;
}
There are two types of parallel IO, independent (the default) and collective.
Independent IO means that each process can do IO independently. It should not
depend on or be affected by other processes. Collective IO is a way of doing
IO defined in the MPI-IO standard; unlike independent IO, all processes must
participate in doing IO. To toggle back and forth between
the two types of IO, use the `netCDF4.Variable.set_collective`
`netCDF4.Variable`method. All metadata
operations (such as creation of groups, types, variables, dimensions, or attributes)
are collective. There are a couple of important limitations of parallel IO:
- parallel IO for NETCDF4 or NETCDF4_CLASSIC formatted files is only available
if the netcdf library was compiled with MPI enabled HDF5.
- parallel IO for all classic netcdf-3 file formats is only available if the
netcdf library was compiled with PnetCDF.
- If a variable has an unlimited dimension, appending data must be done in collective mode.
If the write is done in independent mode, the operation will fail with a