-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathhands_on.qmd
775 lines (592 loc) · 27.9 KB
/
hands_on.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
---
title: "Spatial weights in PySAL"
format:
html: default
ipynb: default
jupyter: sds
aliases:
- ../chapter_04/hands_on.html
---
In this session, you will be learning the ins and outs of one of the key pieces in spatial
analysis: spatial weights matrices. These are structured sets of numbers that formalise
geographical relationships between the observations in a dataset. Essentially, a
spatial weights matrix of a given geography is a positive definite matrix of dimensions
$N$ by $N$, where $N$ is the total number of observations:
$$
W = \left(\begin{array}{cccc}
0 & w_{12} & \dots & w_{1N} \\
w_{21} & \ddots & w_{ij} & \vdots \\
\vdots & w_{ji} & 0 & \vdots \\
w_{N1} & \dots & \dots & 0
\end{array} \right)
$$
where each cell $w_{ij}$ contains a value that represents the degree of spatial contact
or interaction between observations $i$ and $j$. A fundamental concept in this context
is that of *neighbour* and *neighbourhood*. By convention, elements in the diagonal ($w_{ii}$)
are set to zero. A *neighbour* of a given observation $i$ is another observation with
which $i$ has some degree of connection. In terms of $W$, $i$ and $j$ are thus neighbors
if $w_{ij} > 0$. Following this logic, the neighbourhood of $i$ will be the set of
observations in the system with which it has a certain connection or those observations
with a weight greater than zero.
There are several ways to create such matrices and many more to transform them so they
contain an accurate representation that aligns with the way you understand spatial
interactions between the elements of a system. This session will introduce the
most commonly used ones and show how to compute them with `PySAL`.
```{python}
import matplotlib.pyplot as plt # <1>
import contextily
import geopandas as gpd
import numpy as np
import pandas as pd
import seaborn as sns
from libpysal import graph
```
1. A common standard is to do all the imports on top of your notebook.
## Data
For this tutorial, you will use a dataset of Basic Settlement Units (ZSJ) in Prague for
2021. The table is available as part of this course, so it can be accessed remotely
through the web. If you want to see how the table was created,
a notebook is available [here](../data/prague_zsj_2021/preprocessing.ipynb).
To make things easier, you will read data from a file posted online so, for now,
you do not need to download any dataset:
```{python}
#| classes: explore
prague = gpd.read_file(
"https://martinfleischmann.net/sds/spatial_graphs/data/zsj_prague_2021.gpkg"
)
prague = prague.set_index("NAZ_ZSJ") # <1>
prague.explore()
```
1. Use the name of each observation as an index. It will help you link them
to the weights matrix.
::: {.callout-note collapse="true"}
## Alternative
Instead of reading the file directly off the web, it is possible to download it manually,
store it on your computer, and read it locally. To do that, you can follow these steps:
1. Download the file by right-clicking on
[this link](https://martinfleischmann.net/sds/spatial_graphs/data/zsj_prague_2021.gpkg)
and saving the file
2. Place the file in the same folder as the notebook where you intend to read it
3. Replace the code in the cell above with:
```python
prague = gpd.read_file(
"zsj_prague_2021.gpkg",
)
```
:::
## Building spatial weights in `PySAL`
### Contiguity
Contiguity weights matrices define spatial connections through the existence of shared
boundaries. This makes it directly suitable to use with polygons: if two polygons share
boundaries to some degree, they will be labelled as neighbours under these kinds of
weights. Exactly how much they need to share is what differentiates the two approaches
we will learn: Queen and Rook.
#### Queen
Under the Queen criteria, two observations only need to share a vertex (a single point)
of their boundaries to be considered neighbours. Constructing a weights matrix under
these principles can be done by running:
```{python}
queen = graph.Graph.build_contiguity(prague, rook=False) # <1>
queen
```
1. The main class that represents a weights matrix is `Graph`. `rook=False` specifies to
use Queen contiguity, while `rook=True` would create Rook contiguity.
The command above creates an object `queen` of the class `Graph`. This is the
format in which spatial weights matrices are stored in `PySAL`. By default, the
weights builder (`Graph.build_contiguity(`) will use the index of the table, which
is useful so you can keep everything in line easily.
::: {.callout-caution}
# New `Graph` and old `W`
The `graph` module of `libpysal` is an implementation of spatial weights matrices released
in September 2023. In the older resources, you will find the `weights` module and the `W`
class instead. `Graph` will eventually replace `W`. Their API is similar, but there are
some differences. Pay attention to the documentation when porting code from `weights`-based
resources to `graph`-based implementation. Or use `Graph.to_W()` and `Graph.from_W()` to
convert one to the other.
:::
A `Graph` object can be queried to find out about the contiguity relations it contains. For
example, if you would like to know who is a neighbour of observation `Albertov`:
```{python}
queen['Albertov']
```
This returns a `pandas.Series` containing each neighbour's ID codes as an index,
with the weights assigned as values. Since you are looking at a raw Queen
contiguity matrix, every neighbour gets a weight of one. If you want to access the weight
of a specific neighbour, `Zderaz` for example, you can do recursive querying:
```{python}
queen['Albertov']['Zderaz']
```
All of these relations can be easily visualised using `Graph.explore()` method you know
from GeoPandas.
```{python}
#| classes: explore
m = prague.explore() # <1>
queen.explore(
prague, m=m, edge_kws=dict(style_kwds=dict(weight=1)), nodes=False # <2>
)
```
1. Use (optionally) polygons as the underlying layer providing context.
2. `Graph` itself represents only topological relations between geometries. To plot the
graph on a map, you need to pass geometries representing graph nodes.
::: {.callout-tip}
# Static plots
Similarly to `GeoDataFrame`, `Graph` can also be visualised as a static plot, using the
`Graph.plot()` method working analogously to `Graph.explore()`.
```{python}
# | fig-cap: Static plot of Queen contiguity
ax = queen.plot(prague, nodes=False, edge_kws={"linewidth": .5})
contextily.add_basemap(ax=ax, crs=prague.crs, source="CartoDB Positron")
ax.set_axis_off()
```
:::
Once created, `Graph` objects can provide much information about the matrix beyond the
basic attributes one would expect. You have direct access to the number of neighbours each
observation has via the attribute `cardinalities`. For example, to find out how many
neighbours observation `E01006524` has:
```{python}
queen.cardinalities['Albertov']
```
Since `cardinalities` is a `Series`, you can use all of the `pandas` functionality you know:
```{python}
queen.cardinalities.head()
```
You can easily compute the mean number of neighbours.
```{python}
queen.cardinalities.mean()
```
Or learn about the maximum number of neighbours a single geometry has.
```{python}
queen.cardinalities.max()
```
This also allows access to quick plotting, which comes in very handy in getting an
overview of the size of neighbourhoods in general:
```{python}
# | fig-cap: Distribution of cardinalites
sns.displot(queen.cardinalities, bins=14, kde=True)
```
The figure above shows how most observations have around five neighbours, but there is
some variation around it. The distribution also seems to follow nearly a symmetric form, where
deviations from the average occur both in higher and lower values almost evenly, with a minor tail towards higher values.
Some additional information about the spatial relationships contained in the matrix is
also easily available from a `Graph` object. You can ask about the number of observations (geometries)
encoded in the graph:
```{python}
queen.n
```
Or learn which geometries are _isolates_, the observations with no neighbours (think about islands).
```{python}
queen.isolates # <1>
```
1. In this case, this `Series` is empty since there are no isolates present.
A nice summary of these properties is available using the `summary()` method.
```{python}
# | classes: explore
queen.summary()
```
Spatial weight matrices can be explored visually in other ways. For example, you can pick
an observation and visualise it in the context of its neighbourhood. The following plot
does exactly that by zooming into the surroundings of ZSJ `Albertov` and displaying
its polygon as well as those of its neighbours together with a relevant portion of the
`Graph`:
```{python}
# | classes: explore
m = prague.loc[queen["Albertov"].index].explore(color="#25b497")
prague.loc[["Albertov"]].explore(m=m, color="#fa94a5")
queen.explore(prague, m=m, focal="Albertov")
```
#### Rook
Rook contiguity is similar to and, in many ways, superseded by Queen contiguity.
However, since it sometimes comes up in the literature, it is useful to know about it.
The main idea is the same: two observations are neighbours if they share some of their
boundary lines. However, in the Rook case, it is not enough to share only one point.
It needs to be at least a segment of their boundary. In most applied cases, these
differences usually boil down to how the geocoding was done, but in some cases, such as
when you use raster data or grids, this approach can differ more substantively, and it
thus makes more sense.
From a technical point of view, constructing a Rook matrix is very similar:
```{python}
rook = graph.Graph.build_contiguity(prague, rook=True)
rook
```
The output is of the same type as before, a `Graph` object that can be queried and used in
very much the same way as any other one.
#### Bishop
In a similar sense, you may want to create Bishop contiguity - consider two geometries
neighbouring if they share only one vertex. There's no constructor for that but you can
derive Bishop contiguity from Queen and Rook as Queen contains both shared vertex and
shared edge relations. You can, therefore, subtract one from the other to get Bishop
contiguity:
```{python}
bishop = queen.difference(rook)
bishop
```
See the remaining edges:
```{python}
# | fig-cap: Bishop contiguity derived as a difference of Queen and Rook
ax = bishop.plot(prague, nodes=False, edge_kws={"linewidth": .5})
contextily.add_basemap(ax=ax, crs=prague.crs, source="CartoDB Positron")
ax.set_axis_off()
```
Only a small fraction of Queen neighbours are not Rook neighbors at the same time
(represented by the number of edges here), but there are some.
### Distance
Distance-based matrices assign the weight to each pair of observations as a function of
how far from each other they are. How this is translated into an actual weight varies
across types and variants, but they all share that the ultimate reason why two
observations are assigned some weight is due to the distance between them.
#### K-Nearest Neighbors
One approach to define weights is to take the distances between a given observation and
the rest of the set, rank them, and consider as neighbours the $k$ closest ones. That is
exactly what the $k$-nearest neighbours (KNN) criterium does.
To calculate KNN weights, you can use a similar function as before and derive them from a
`GeoDataFrame`:
```{python}
prague["centroid"] = prague.centroid # <1>
prague = prague.set_geometry("centroid") # <2>
knn5 = graph.Graph.build_knn(prague, k=5)
knn5
```
1. Distance-based methods usually work only with points (for performance reasons). Create a centroid representation
of each geometry.
2. Assign it as an active geometry column.
See the resulting graph visually:
```{python}
# | fig-cap: A graph representing 5 nearest neighbours
ax = knn5.plot(prague, nodes=False, edge_kws={"linewidth": .5})
contextily.add_basemap(ax=ax, crs=prague.crs, source="CartoDB Positron")
ax.set_axis_off()
```
Note how you need to specify the number of nearest neighbours you want to consider with the
argument `k`. Since it is a polygon `GeoDataFrame` that you are passing, you need to
compute the centroids to derive distances between observations.
Alternatively, you can provide the points in the form of an array, skipping this way the
dependency of a file on disk:
```{python}
pts = pd.DataFrame(
{"X": prague.geometry.x, "Y": prague.geometry.y} # <1>
).values # <2>
knn5_from_pts = graph.Graph.build_knn(pts, k=5)
```
1. If your `GeoSeries` contains Point geometries, you can access their coordinates using `GeoSeries.x` and `GeoSeries.y`.
2. `.values` returns only an array of values without the `pandas` index. It is the underlying data structure coming
from the `numpy` package. You will learn more about `numpy` in due course.
#### Distance band
Another approach to building distance-based spatial weights matrices is to draw a circle of
certain radius and consider neighbour every observation that falls within the circle.
The technique has two main variations: binary and continuous. In the former one, every
neighbour is given a weight of one, while in the second one, the weights can be further
tweaked by the distance to the observation of interest.
To compute binary distance matrices in `PySAL`, you can use the following command:
```{python}
dist1kmB = graph.Graph.build_distance_band(prague, 1000)
dist1kmB
```
This creates a binary matrix that considers neighbors of an observation every polygon
whose centroid is closer than 1,000 metres (1 km) to the centroid of such observation.
Check, for example, the neighbourhood of polygon `Albertov`:
```{python}
dist1kmB['Albertov']
```
Note that the units in which you specify the distance directly depend on the CRS in
which the spatial data are projected, and this has nothing to do with the weights
building but it can affect it significantly. Recall how you can check the CRS of a
`GeoDataFrame`:
```{python}
prague.crs
```
In this case, you can see the unit is expressed in metres (`m`). Hence you set the
threshold to 1,000 for a circle of 1km radius. The whole graph looks like this:
```{python}
# | fig-cap: A graph representing distance band of 1 km.
ax = dist1kmB.plot(prague, nodes=False, edge_kws={"linewidth": .5})
contextily.add_basemap(ax=ax, crs=prague.crs, source="CartoDB Positron")
ax.set_axis_off()
```
### Block weights
Block weights connect every observation in a dataset that belongs to the same category
in a list provided ex-ante. Usually, this list will have some relation to geography and
the location of the observations but, technically speaking, all one needs to create
block weights is a list of memberships. In this class of weights, neighbouring
observations, those in the same group, are assigned a weight of one, and the rest
receive a weight of zero.
In this example, you will build a spatial weights matrix that connects every ZSJ with
all the other ones in the same KU. See how the KU code is expressed for every ZSJ:
```{python}
prague.head()
```
To build a block spatial weights matrix that connects as neighbours all the ZSJ in the
same KU, you only require the mapping of codes. Using `PySAL`, this is a one-line task:
```{python}
block = graph.Graph.build_block_contiguity(prague['NAZ_KU'])
block
```
If you query for the neighbors of observation by its name, it will work as usual:
```{python}
block['Albertov']
```
Notice the resulting blocks are clearly visible when you visualise the graph:
```{python}
# | fig-cap: A graph representing block contiguity
ax = block.plot(prague, nodes=False, edge_kws={"linewidth": .5})
contextily.add_basemap(ax=ax, crs=prague.crs, source="CartoDB Positron")
ax.set_axis_off()
```
## Weighted graphs
So far, all the graphs resulted in boolean weights. The two geometries either are
neighbors or they are not. However, there are many situations where you want to
give a different _weight_ to each neighbor. For example, you may want to give a higher
weight to those neighbors that are closer over those that are further away. Take
the example of K-nearest neighbors.
```{python}
knn5['Albertov']
```
As you can see, all have the same weight - 1.
### Kernel graphs
To make use of the actual distance, you
can build a Kernel graph, where the distance between two points is passed to the kernel
function deriving a weight from it.
```{python}
knn5_kernel = graph.Graph.build_kernel(prague, k=5) # <1>
knn5_kernel
```
1. As with KNN, kernel uses point-to-point distance. You can do kernel-weighted KNN by passing `k`.
The default kernel is Gaussian but there's a long list of built-in kernels and option to
pass a custom function as well.
The resulting Graph has the same neighbors as the `knn5` but the weight is derived from the
distance between the points.
```{python}
knn5_kernel['Albertov']
```
### Distance-weighted distance band
An extension of the distance band weights above introduces further detail by assigning different
weights to different neighbours within the radius circle based on how far they are from
the observation of interest. For example, you could think of assigning the inverse of the
distance between observations $i$ and $j$ as $w_{ij}$. This can be computed with the
following command:
```{python}
dist1kmC = graph.Graph.build_distance_band(prague, 1000, binary=False)
dist1kmC
```
In `dist1kmC`, every observation within the 1km circle is assigned a weight equal to
the inverse distance between the two:
$$
w_{ij} = \dfrac{1}{d_{ij}}
$$
This way, the further apart $i$ and $j$ are from each other, the smaller the weight $w_{ij}$ will be.
Contrast the binary neighbourhood with the continuous one for `Albertov`:
```{python}
dist1kmC['Albertov']
```
Following this logic of more detailed weights through distance, there is a temptation to
take it further and consider everyone else in the dataset as a neighbor whose weight
will then get modulated by the distance effect shown above. However, although
conceptually correct, this approach is not always the most computationally or practical
one. Because of the nature of spatial weights matrices, particularly because of the fact
their size is $N$ by $N$, they can grow substantially large. A way to cope with this
problem is by making sure they remain fairly *sparse* (with many zeros). Sparsity is
typically ensured in the case of contiguity or KNN by construction but, with inverse
distance, it needs to be imposed as, otherwise, the matrix could be potentially entirely
dense (no zero values other than the diagonal). In practical terms, what is usually done
is to impose a distance threshold beyond which no weight is assigned and interaction is
assumed to be non-existent. Beyond being computationally feasible and scalable, results
from this approach usually do not differ much from a fully "dense" one as the additional
information that is included from further observations is almost ignored due to the
small weight they receive. In this context, a commonly used threshold, although not
always best, is that which makes every observation to have at least one neighbor.
### Perimeter-weighted contiguity
Similarly, you can adapt contiguity graphs to derive weight from the proportion of
the shared boundary between adjacent polygons.
```{python}
prague = prague.set_geometry("geometry")
rook_by_perimeter = graph.Graph.build_contiguity(prague, by_perimeter=True)
rook_by_perimeter
```
By default, the weight is the length of the shared boundary:
```{python}
rook_by_perimeter['Albertov']
```
::: {.callout-tip}
# Computing threshold
Such a threshold can be calculated using the `min_threshold_distance` function from `libpysal.weights` module:
```{python}
from libpysal import weights
min_thr = weights.min_threshold_distance(pts)
min_thr
```
Which can then be used to calculate an inverse distance weights matrix:
```py
min_dist = graph.Graph.build_distance_band(prague, min_thr, binary=False)
```
:::
## Standardizing `Graph` relationships
In the context of many spatial analysis techniques, a spatial weights matrix with raw
values (e.g. ones and zeros for the binary case or lenghts for the perimeter-weighted
contiguity) is not always the best-suiting one for
analysis and some sort of transformation is required. This implies modifying each weight
so they conform to certain rules. `PySAL` has transformations baked right into the `Graph`
object, so it is possible to check the state of an object as well as to modify it.
Consider the original Queen weights for observation `Albertov`:
```{python}
queen['Albertov']
```
Since it is contiguity, every neighbour gets one, the rest zero weight. You can check if
the object `queen` has been transformed or not by calling the property `.transformation`:
```{python}
queen.transformation
```
where `"O"` stands for "original", so no transformations have been applied yet. If you want
to apply a row-based transformation so every row of the matrix sums up to one, you use
the `.transform()` method as follows:
```{python}
row_wise_queen = queen.transform("R") # <1>
```
1. `.transform()` returns a new `Graph` with transformed weights. `"R"` stands for row-wise transformation.
Now you can check the weights of the same observation as above and find they have been modified:
```{python}
row_wise_queen['Albertov']
```
The sum of weights for all the neighbours is one:
```{python}
row_wise_queen['Albertov'].sum()
```
`PySAL` currently supports the following transformations:
* `O`: original, returning the object to the initial state.
* `B`: binary, with every neighbour having assigned a weight of one.
* `R`: row, with all the neighbours of a given observation adding up to one.
* `V`: variance stabilising, with the sum of all the weights being constrained to the number of observations.
* `D`: double, with all the weights across all observations adding up to one.
The same can be done with non-binary weights. Given the example of perimeter-weighted Rook
contiguity created above, you can transform the lengths of shared boundaries to a row-normalised
value.
```{python}
rook_by_perimeter['Albertov']
```
Transform the weights:
```{python}
rook_by_perimeter_row = rook_by_perimeter.transform("R") # <1>
rook_by_perimeter_row['Albertov']
```
This may be necessary for certain statistical operations assuming a sum of weights in a
row equals to 1.
## Reading and Writing spatial weights in `PySAL`
Sometimes, suppose a dataset is very detailed or large. In that case, it can be costly
to build the spatial weights matrix of a given geography, and, despite the optimisations
in the `PySAL` code, the computation time can quickly grow out of hand. In these
contexts, it is useful not to have to rebuild a matrix from scratch every time you need
to re-run the analysis. A useful solution, in this case, is to build the matrix once and
save it to a file where it can be reloaded at a later stage if needed.
`PySAL` has a way to efficiently write any kind of `Graph` object into a file using the method
`.to_parquet()`. This will serialise the underlying adjacency table into an Apache Parquet
file format that is very fast to read and write and can be compressed to a small size.
```{python}
queen.to_parquet("queen.parquet")
```
You can then read such a file back to the `Graph` from using `graph.read_parquet()` function:
```{python}
queen_2 = graph.read_parquet("queen.parquet")
```
::: {.callout-warning}
# Interoperabilty with other tools
Weights saved as a Parquet file are efficient if PySAL is the only tool you want to use to read and write them.
If you want to save them to other file formats like GAL or GWT that are readable by tools like GeoDa, you can
save graphs to interoperable `GAL` (binary) or `GWT` (non-binary) formats using:
```py
queen.to_gal("queen.gal")
queen.to_gwt("queen.gwt")
```
Similarly, you can read such files with `graph.read_gal()` and `graph.read_gwt()`.
:::
## Using graph to aggregate variables
A typical use of a graph is a description of the distribution of values within the
neighborhood of each geometry. This can be done either solely based on the neighborhood
definition, ignoring the actual weight, or as a weighted operation.
### Describe
The first option is implemented within the `describe()` method, which returns a statistical
description of distribution of a given variable within each neighborhood.
For this illustration, you will use the area of each polygon as the variable of interest.
And to make things a bit nicer later on, you will keep the log of the area instead of the
raw measurement. Hence, let's create a column for it:
```{python}
prague = prague.set_geometry("geometry") # <1>
prague["area"] = np.log(prague.area) # <2>
```
1. Remember to set active geometry back to polygons before computing area.
2. `np.log` is a function from the `numpy` package that computes the natural logarithm of an array of values (elementwise).
```{python}
area_description = queen.describe(
prague["area"],
statistics=["mean", "median", "min", "max", "std"],
)
area_description.head()
```
The table can be directly assigned to the `prague` DataFrame, if needed.
```{python}
prague = prague.join(area_description)
prague.head()
```
Compare a subset visually on a map. You can create a subplot with 4 axes and loop over
the selected columns to do that efficiently.
```{python}
# | fig-cap: Log of area and its derivatives based on Queen contiguity
columns = ["area", "mean", "std", "max"] # <1>
fig, axs = plt.subplots(2, 2, figsize=(9, 6)) # <2>
axs = axs.flatten() # <3>
for column, ax in zip(columns, axs): # <4>
prague.plot(column, ax=ax, legend=True, cmap="YlGnBu") # <5>
ax.set_axis_off() # <6>
ax.set_title(column) # <7>
```
1. Specify a list of columns to be plotted.
2. Create a figure with 4 subplots in a 2x2 grid, sized 9x6 inches.
3. Flatten the axis for easier iteration.
4. Make a `for` loop over two arrays at the same time zipped together.
5. Make the map, show its legend and assign a custom colormap.
6. Disable axis boundary plotting.
7. Set title of each subplot.
## Spatial Lag
If you need to use the weight, you are looking for the so-called _spatial
lag_. The spatial lag of a given variable is the product of a spatial weight matrix and
the variable itself:
$$
Y_{sl} = W Y
$$
where $Y$ is a Nx1 vector with the values of the variable. Recall that the product of a
matrix and a vector equals the sum of a row by column element multiplication for the
resulting value of a given row. In terms of the spatial lag:
$$
y_{sl-i} = \displaystyle \sum_j w_{ij} y_j
$$
If you are using row-standardized weights, $w_{ij}$ becomes a proportion between zero and
one, and $y_{sl-i}$ can be seen as the mean value of $Y$ in the neighborhood of $i$, the
same you would get with `describe()`.
The spatial lag is a key element of many spatial analysis techniques, as you will see
later on and, as such, it is fully supported in `PySAL`. To compute the spatial lag of a
given variable, `area`, for example:
```{python}
queen_lag = row_wise_queen.lag(prague["area"]) # <1>
queen_lag[:5] # <2>
```
1. Compute spatial lag of `area` using row-wised standardised weights to get mean values. If you used binary weights, the resulting
spatial lag would equal to sum of values.
2. Print the first five elements
Line 1 contains the actual computation, which is highly optimised in `PySAL`. Note that,
despite passing in a `pd.Series` object, the output is a `numpy` array. This however,
can be added directly to the table `prague`:
```{python}
prague['area_lag'] = queen_lag
```
### Categorical lag
If you have categorical data, you can use the categorical lag, which returns the most
common category within the neighbors. You can try with the KU name.
```{python}
categorical_lag = row_wise_queen.lag(prague["NAZ_KU"], ties="tryself")
categorical_lag[:5]
```
Note that when there is a tie, i.e., no category is the most common, `lag()` will raise
an error by default. You can let it resolve ties either by including the value from the
focal observation to break the tie or by picking the one randomly (`ties="random"`).
## Acknowledgements {.appendix}
This section is derived from _A Course on Geographic Data Science_ by
@darribas_gds_course, licensed under CC-BY-SA 4.0. The code was updated to use the new
`libpysal.graph` module instead of `libpysal.weights`. The text was slightly adapted
to accommodate a different dataset and the module change.