The goal of midlines
is to estimate the midline of one or more
polygons, taking inputs in sf
formats.
midlines
wraps around several functions from other packages,
predominantly sf
. It’s development
has been a learning experience for the author, but hopefully some users
will find it useful. Comments, suggestions and contributions are
welcome.
You can install the current version of the package from GitHub with:
# install.packages("devtools")
# library(devtools)
devtools::install_github("RichardPatterson/midlines")
N.B. to install midlines from GitHub, you will need to install and attach the devtools package if you have not already done so.
The package contains an example polygon representing a short section of
the River Thames in central London. This was derived from
OpenStreetMap data, downloaded using
the OSMdata
package and as such,
copyright is retained by OpenStreetMap contributors.
# Load libraries
library(midlines)
library(sf)
#> Warning: package 'sf' was built under R version 4.3.3
library(units)
#> Warning: package 'units' was built under R version 4.3.3
plot(thames$geometry)
To estimate the midline of this stretch of the Thames, use the
midline_draw
function.
# create a sf collection for the midline
m1 = midlines_draw(thames)
# plot midline with a different colour for each segment (line_id)
plot(m1$geometry, col = m1$line_id, add = TRUE)
We can see the midline has been drawn but there are many unwanted short
side branches, especially where the banks of the river are not smooth.
To understand why, we need to know what midline_draw
has done.
midlines_draw
used the sf
function st_voronoi
to perform an
Voronoi
tessellation
and then removed all line segments not entirely within the polygon.
midlines
can help remove these unwanted side branches, but it is
possible you can prevent their creation (see sections on border
lines gaps and
zig-zagging).
To remove the unwanted branches, the line segments at the end of a line
can be identified using a binary variable (removed_flag
) added to the
dataset with the midlines_clean
function. The flagged line segments
can be explored and removed as required. Specifying the n_removed
option will flag that number of line segments from the end of each line.
Previous experimentation indicated that the unwanted branches in this
example are made up of ≤ 4 line segments.
# remove 4 line segments from the end of each line
m2 = midlines_clean(m1, n_removed = 4)
# plot flagged (red) and unflagged (blue) line segments
plot(m2$geometry, col = c("BLUE", "RED")[m2$removed_flag])
# plot only the cleaned midline
plot(m2$geometry[m2$removed_flag==0], col = "BLUE")
In addition to the unwanted side branches, midlines_clean
flagged for
removal the line segments at the end of the midline itself. If losing
the ends of the midline is a problem, midlines_check
can unflag some
line segments.
As the unwanted side branches are short, we can unflag longer groups of
flagged line segments. We know that the unwanted side branches contain ≤
4 line segments, whereas the midlines itself is much longer. Therefore,
if we remove up to 5 line segments, we know that any removed lines of 5
or more line segments are not unwanted side chains but part of the
midline. midlines_check
takes as an input the output from
midlines_clean
and adds another binary variable (removed_flag2
),
which should flag the unwanted side chains but not the ends of the
midlines.
m2 = midlines_clean(m1, n_removed = 5)
m3 = midlines_check(m2, n_removed = 5)
# plot initially flagged in red and remainder in blue
plot(m3$geometry, col = c("BLUE", "RED")[m3$removed_flag2])
# overlay those initially flagged but identifed as having 5+ segments in green
plot(m3$geometry[m3$removed_flag==1 & m3$removed_flag2==0], col = "GREEN", add = TRUE)
The ends of the midline are restored (green) but the unwanted side
branches are still flagged for removal (red). In addition to adding back
longer lines based on the number of line segments, the length
option
can be used to specify a length (in
Units). In this example,
specifying length = set_units(230,"m")
gives the same results as
n_removed = 5
There is a trade off between removing all unwanted side branches and retaining the desired midlines. Some experimentation might be required to get the desired results. In this case we’ve unflagged the forked midline on the right end of our river segment and also some small bits of unconnected line (although see removing unwanted small bits of midlines.
If there is a specific area of interest, this can be used to help control the estimation and cleaning of the midlines. This works best when the polygon used for estimation is larger than the area of actual interest. The border of the area of interest is specified as an sf linestring, as in this example.
# generate a linestring marking the border
bbox_line = st_cast(st_as_sfc(st_bbox(c(xmin = 535070, ymin = 177800, xmax = 542560, ymax = 181550), crs = st_crs(thames))), "LINESTRING")
m1 = midlines_draw(thames, border_line = bbox_line)
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
plot(thames$geometry)
plot(m1$geometry, col = "BLUE", add = TRUE)
plot(bbox_line, add = TRUE)
Specifying the boarder_line option in midlines_clean
and
midlines_check
can also ensure the line ends that contact the boarder
line are not flagged for removal.
m2 = midlines_clean(m1, n_removed = 5)
m3 = midlines_check(m2, border_line = bbox_line)
plot(m3$geometry[m3$removed_flag2 == 0], col = "BLUE"[m3$removed_flag2])
plot(bbox_line, add = TRUE)
It is also possible to specify the border_line option with midlines_draw and midlines_clean, which crops out midline segments outside the area of interest at the outset and safes time by not flagging and unflagging those lines.
So far we’ve ignored the side branches north of the the Thames that join the main channel. Attempts to estimate the midline of the left channel has been entirely unsuccessful and the right channel midline has a marked zig-zag pattern. The zig-zagging section below covers that issue but here we look more closely at the lack of any midline in the left of the two side branches.
# generate a polygon to focus on the left side channel(s)
bbox_poly = st_as_sfc(st_bbox(c(xmin = 536070, ymin = 180700, xmax = 537800, ymax = 181850), crs = st_crs(thames)))
# a slightly smaller area to use as a border line
bbox_line_s = st_cast(st_as_sfc(st_bbox(c(xmin = 536120, ymin = 180760, xmax = 537750, ymax = 181800), crs = st_crs(thames))),"LINESTRING")
plot(thames$geometry)
plot(bbox_poly, add = TRUE)
# crop the larger thames polygon to the area we want
side1 = st_intersection(thames, bbox_poly)
plot(side1$geometry)
plot(bbox_line_s, add = TRUE)
# estimating midline of the original polygon
m_s1 = midlines_draw(side1, border_line = bbox_line_s)
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
plot(m_s1$geometry, add = TRUE, col = "RED")
The lack of lines is due to the narrowness of the channel, or more
precisely, the gaps between points in the polygon perimeter relative to
the width of the polygon. The gap between points that was sufficient for
the wider main River Thames is insufficient to allow successful midline
estimation in the narrow channel. The midlines_draw
function has a
dfMaxLength
option specify the maximum distance between points and to
add additional points as required. This argument is passed to
sf::st_segmentize so see that help file for more details.
# some trial and error revealed 8 meters to be the optimal max length between points
plot(side1$geometry)
plot(bbox_line_s, add = TRUE)
m_s1 = midlines_draw(side1, dfMaxLength = set_units(8,"m"), border_line = bbox_line_s)
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
#plot(ml2$geometry, add = TRUE, col = "BLUE")
# using the border_line to prevent removal of lines of interest
m_s2 = midlines_clean(m_s1, n_removed = 20, border_line = bbox_line_s)
plot(m_s2$geometry, col = c("BLUE", "RED")[m_s2$removed_flag], add = TRUE)
Conversely, if the points on the perimeter are too dense and result in a very complicated tessellation or slow computation times, then sf::st_line_sample might also be useful.
A zig-zagging midline results from the offset between points on either side of the polygon when the Voronoi tessellation is done. To illustrate this, there are a series of midlines drawn with very simple oblong polygons.
mat1 = matrix(c(0,0,2,0,4,0,6,0,8,0,10,0,12,0,12,2,11,2,9,2,7,2,5,2,3,2,1,2,0,2,0,0),ncol=2, byrow=TRUE)
pts1 = st_multipoint(mat1)
pol1 = st_polygon(list(mat1))
ml1 = midlines_draw(pol1)
plot(pol1)
plot(pts1, add = TRUE)
plot(ml1$geometry, col = ml1$line_id, add = TRUE)
When the points on the polygon perimeter are offset, this results in the zig-zag pattern. One way to resolve this is to ensure that points are directly opposite one another on the polygon.
mat2 = matrix(c(0,0,2,0,4,0,6,0,8,0,10,0,12,0,12,2,12,2,10,2,8,2,6,2,4,2,2,2,0,2,0,0),ncol=2, byrow=TRUE)
pts2 = st_multipoint(mat2)
pol2 = st_polygon(list(mat2))
ml2 = midlines_draw(pol2)
plot(pol2)
plot(pts2, add = TRUE)
plot(ml2$geometry, col = ml2$line_id, add = TRUE)
However, its not always possible to control the points that make up the polygons of interest. Another option is to reduce the gap between the points so even though they are offset the zig-zag is less pronounced.
mat3 = matrix(c(0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0,8,0,9,0,10,0,11,0,12,0,
12,2,11.5,2,10.5,2,9.5,2,8.5,2,7.5,2,6.5,2,5.5,2,4.5,2,3.5,2,2.5,2,1.5,2,0.5,2,0,2,0,0),ncol=2, byrow=TRUE)
pts3 = st_multipoint(mat3)
pol3 = st_polygon(list(mat3))
ml3 = midlines_draw(pol3)
plot(pol3)
plot(pts3, add = TRUE)
plot(ml3$geometry, col = ml3$line_id, add = TRUE)
Returning to the river example and looking more closely at the zig-zag midline of the right side channel.
# generate a polygon to focus on the right side channel(s)
bbox_poly = st_as_sfc(st_bbox(c(xmin = 538700, ymin = 180750, xmax = 539800, ymax = 181850), crs = st_crs(thames)))
# a slightly smaller area to use as a border line
bbox_line_s = st_cast(st_as_sfc(st_bbox(c(xmin = 538750, ymin = 180800, xmax = 539750, ymax = 181800), crs = st_crs(thames))),"LINESTRING")
plot(thames$geometry)
plot(bbox_poly, add = TRUE)
# crop the larger thames polygon to the area we want
side2 = st_intersection(thames,bbox_poly)
plot(side2$geometry)
plot(bbox_line_s, add = TRUE)
# estimating midline of the original polygon
m_s1 = midlines_draw(side2, border_line = bbox_line_s)
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
plot(ml1$geometry, add = TRUE, col = "RED")
One option would be to use the dfMaxLength option with midlines_draw like we did on the left side channel.
plot(side2$geometry)
m_s1a = midlines_draw(side2, border_line = bbox_line_s, dfMaxLength = set_units(8,"m"))
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
# clean this to remove extraneous lines
m_s2a = midlines_clean(m_s1a, border_line = bbox_line_s, n_removed = 5)
plot(m_s2a$geometry[m_s2a$removed_flag==0], add = TRUE, col = "RED")
#plot(ml2$geometry, add = TRUE, col = "RED")
An alternative is to smooth the midline using a rolling average - basically a wrapper round the rollapply function in the excellent Zoo package.
# smooth with a rolling average
m_s2b = midlines_smooth(m_s1)
plot(side2$geometry)
plot(m_s2b$geometry, add = TRUE, col = "BLUE")
midlines_smooth
will smooth lines in a network of midlines but it does
not move the point where two or more lines join. The width
option can
be specified for midlines_smooth
, this is passed to rollapply so see
documentation there for more details. Beware that the wider the window
the greater tenancy to cut the corners and therefore it becomes less
like a midline.
Sometime it might be useful to remove small bits of midline that are not connected the to main midline. For example, in our earlier example
plot(m2$geometry)
# remove the short bits of line
m2 = midlines_debit(m2, length = set_units(500, "m") )
plot(m2$geometry)
In addition to manipulating the density of points on the perimeter of
the initial polygon, it is sometimes desirable to manipulate the density
of points that form the midline. This results in fewer longer line
segments, rather than many shorter line segments. This can be done using
midlines_dedensify
either before or after cleaning. Like with the
midlines_smooth
function, the points where lines join are not affected
but between line joins the number of points can be reduced. This
function calls sf::st_line_sample so see this for the relevant options.
The estimation of midlines is done using
Voronoi
tessellation
via the sf::st_voronoi function. Some understanding of this process can
help to understand how the attributes of the polygon will influence it’s
estimated midline. For a given set of points, Voronoi tessellation
identifies the area around each point, i.e., that are closer to that
point than any other. Below is an demonstration using the points in the
perimeter of the river Thames polygon as the starting point.
An alternative use case is to merge two versions of a road network that are measured differently and therefore do not exactly match. Use buffers to creating a polygon around both networks and then estimate the midline of this polygon to give a combined network against which each can be compared. See this blog post for more details.
If you use this package in your work it would be helpful if you would cite it. A suggested citation is:
Patterson, R. midlines: Estimate Polygon Midlines. R package version 0.0.0.9002. DOI: 10.17863/CAM.100113
- The very comprehensive R package
cmgo
does many of the same things as this package and much more. If your aim to to estimate the midline of a river, you should probably start withcmgo
- centerline is a Python library with a similar aim