Skip to content

Estimate the midline of an sf polygon

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

RichardPatterson/midlines

Repository files navigation

midlines

The goal of midlines is to estimate the midline of one or more polygons, taking inputs in sf formats.

midlines wraps around several functions from other packages, predominantly sf. It’s development has been a learning experience for the author, but hopefully some users will find it useful. Comments, suggestions and contributions are welcome.

Installation

You can install the current version of the package from GitHub with:

# install.packages("devtools")
# library(devtools)
devtools::install_github("RichardPatterson/midlines")

N.B. to install midlines from GitHub, you will need to install and attach the devtools package if you have not already done so.

An example - finding the midline of a river

The package contains an example polygon representing a short section of the River Thames in central London. This was derived from OpenStreetMap data, downloaded using the OSMdata package and as such, copyright is retained by OpenStreetMap contributors.

# Load libraries
library(midlines)
library(sf)
#> Warning: package 'sf' was built under R version 4.3.3
library(units)
#> Warning: package 'units' was built under R version 4.3.3

plot(thames$geometry)

To estimate the midline of this stretch of the Thames, use the midline_draw function.

# create a sf collection for the midline
m1 = midlines_draw(thames)

# plot midline with a different colour for each segment (line_id)
plot(m1$geometry, col = m1$line_id, add = TRUE)

We can see the midline has been drawn but there are many unwanted short side branches, especially where the banks of the river are not smooth. To understand why, we need to know what midline_draw has done. midlines_draw used the sf function st_voronoi to perform an Voronoi tessellation and then removed all line segments not entirely within the polygon. midlines can help remove these unwanted side branches, but it is possible you can prevent their creation (see sections on border lines gaps and zig-zagging).

Cleaning the midline

To remove the unwanted branches, the line segments at the end of a line can be identified using a binary variable (removed_flag) added to the dataset with the midlines_clean function. The flagged line segments can be explored and removed as required. Specifying the n_removed option will flag that number of line segments from the end of each line. Previous experimentation indicated that the unwanted branches in this example are made up of ≤ 4 line segments.

# remove 4 line segments from the end of each line
m2 = midlines_clean(m1, n_removed = 4)

# plot flagged (red) and unflagged (blue) line segments
plot(m2$geometry, col = c("BLUE", "RED")[m2$removed_flag])

# plot only the cleaned midline
plot(m2$geometry[m2$removed_flag==0], col = "BLUE")

Retaining the ends of the midline

In addition to the unwanted side branches, midlines_clean flagged for removal the line segments at the end of the midline itself. If losing the ends of the midline is a problem, midlines_check can unflag some line segments.

As the unwanted side branches are short, we can unflag longer groups of flagged line segments. We know that the unwanted side branches contain ≤ 4 line segments, whereas the midlines itself is much longer. Therefore, if we remove up to 5 line segments, we know that any removed lines of 5 or more line segments are not unwanted side chains but part of the midline. midlines_check takes as an input the output from midlines_clean and adds another binary variable (removed_flag2), which should flag the unwanted side chains but not the ends of the midlines.

m2 = midlines_clean(m1, n_removed = 5)
m3 = midlines_check(m2, n_removed = 5)

# plot initially flagged in red and remainder in blue
plot(m3$geometry, col = c("BLUE", "RED")[m3$removed_flag2])

# overlay those initially flagged but identifed as having 5+ segments in green 
plot(m3$geometry[m3$removed_flag==1 & m3$removed_flag2==0], col = "GREEN", add = TRUE)

The ends of the midline are restored (green) but the unwanted side branches are still flagged for removal (red). In addition to adding back longer lines based on the number of line segments, the length option can be used to specify a length (in Units). In this example, specifying length = set_units(230,"m") gives the same results as n_removed = 5

There is a trade off between removing all unwanted side branches and retaining the desired midlines. Some experimentation might be required to get the desired results. In this case we’ve unflagged the forked midline on the right end of our river segment and also some small bits of unconnected line (although see removing unwanted small bits of midlines.

Border line

If there is a specific area of interest, this can be used to help control the estimation and cleaning of the midlines. This works best when the polygon used for estimation is larger than the area of actual interest. The border of the area of interest is specified as an sf linestring, as in this example.

# generate a linestring marking the border
bbox_line = st_cast(st_as_sfc(st_bbox(c(xmin = 535070, ymin = 177800, xmax = 542560, ymax = 181550), crs = st_crs(thames))), "LINESTRING")

m1 = midlines_draw(thames, border_line = bbox_line)
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries

plot(thames$geometry)
plot(m1$geometry, col = "BLUE", add = TRUE)
plot(bbox_line, add = TRUE)

Specifying the boarder_line option in midlines_clean and midlines_check can also ensure the line ends that contact the boarder line are not flagged for removal.

m2 = midlines_clean(m1, n_removed = 5)
m3 = midlines_check(m2, border_line = bbox_line)

plot(m3$geometry[m3$removed_flag2 == 0], col = "BLUE"[m3$removed_flag2])
plot(bbox_line, add = TRUE)

It is also possible to specify the border_line option with midlines_draw and midlines_clean, which crops out midline segments outside the area of interest at the outset and safes time by not flagging and unflagging those lines.

Gaps

So far we’ve ignored the side branches north of the the Thames that join the main channel. Attempts to estimate the midline of the left channel has been entirely unsuccessful and the right channel midline has a marked zig-zag pattern. The zig-zagging section below covers that issue but here we look more closely at the lack of any midline in the left of the two side branches.

# generate a polygon to focus on the left side channel(s)
bbox_poly = st_as_sfc(st_bbox(c(xmin = 536070, ymin = 180700, xmax = 537800, ymax = 181850), crs = st_crs(thames)))
# a slightly smaller area to use as a border line
bbox_line_s = st_cast(st_as_sfc(st_bbox(c(xmin = 536120, ymin = 180760, xmax = 537750, ymax = 181800), crs = st_crs(thames))),"LINESTRING")

plot(thames$geometry)
plot(bbox_poly, add = TRUE)

# crop the larger thames polygon to the area we want
side1 = st_intersection(thames, bbox_poly)

plot(side1$geometry)
plot(bbox_line_s, add = TRUE)

# estimating midline of the original polygon
m_s1 = midlines_draw(side1, border_line = bbox_line_s)
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
plot(m_s1$geometry, add = TRUE, col = "RED")

The lack of lines is due to the narrowness of the channel, or more precisely, the gaps between points in the polygon perimeter relative to the width of the polygon. The gap between points that was sufficient for the wider main River Thames is insufficient to allow successful midline estimation in the narrow channel. The midlines_draw function has a dfMaxLength option specify the maximum distance between points and to add additional points as required. This argument is passed to sf::st_segmentize so see that help file for more details.

# some trial and error revealed 8 meters to be the optimal max length between points
plot(side1$geometry)
plot(bbox_line_s, add = TRUE)

m_s1 = midlines_draw(side1, dfMaxLength = set_units(8,"m"), border_line = bbox_line_s)
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
#plot(ml2$geometry, add = TRUE, col = "BLUE")

# using the border_line to prevent removal of lines of interest
m_s2 = midlines_clean(m_s1, n_removed = 20, border_line = bbox_line_s)
plot(m_s2$geometry, col = c("BLUE", "RED")[m_s2$removed_flag], add = TRUE)

Conversely, if the points on the perimeter are too dense and result in a very complicated tessellation or slow computation times, then sf::st_line_sample might also be useful.

Zig-zagging

A zig-zagging midline results from the offset between points on either side of the polygon when the Voronoi tessellation is done. To illustrate this, there are a series of midlines drawn with very simple oblong polygons.

mat1 = matrix(c(0,0,2,0,4,0,6,0,8,0,10,0,12,0,12,2,11,2,9,2,7,2,5,2,3,2,1,2,0,2,0,0),ncol=2, byrow=TRUE) 
pts1 = st_multipoint(mat1)
pol1 = st_polygon(list(mat1))
ml1 = midlines_draw(pol1)

plot(pol1)
plot(pts1, add = TRUE)
plot(ml1$geometry, col = ml1$line_id, add = TRUE)

When the points on the polygon perimeter are offset, this results in the zig-zag pattern. One way to resolve this is to ensure that points are directly opposite one another on the polygon.

mat2 = matrix(c(0,0,2,0,4,0,6,0,8,0,10,0,12,0,12,2,12,2,10,2,8,2,6,2,4,2,2,2,0,2,0,0),ncol=2, byrow=TRUE) 
pts2 = st_multipoint(mat2)
pol2 = st_polygon(list(mat2))
ml2 = midlines_draw(pol2)

plot(pol2)
plot(pts2, add = TRUE)
plot(ml2$geometry, col = ml2$line_id, add = TRUE)

However, its not always possible to control the points that make up the polygons of interest. Another option is to reduce the gap between the points so even though they are offset the zig-zag is less pronounced.

mat3 = matrix(c(0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0,8,0,9,0,10,0,11,0,12,0,
                12,2,11.5,2,10.5,2,9.5,2,8.5,2,7.5,2,6.5,2,5.5,2,4.5,2,3.5,2,2.5,2,1.5,2,0.5,2,0,2,0,0),ncol=2, byrow=TRUE) 
pts3 = st_multipoint(mat3)
pol3 = st_polygon(list(mat3))

ml3 = midlines_draw(pol3)

plot(pol3)
plot(pts3, add = TRUE)
plot(ml3$geometry, col = ml3$line_id, add = TRUE)

Returning to the river example and looking more closely at the zig-zag midline of the right side channel.

# generate a polygon to focus on the right side channel(s)
bbox_poly = st_as_sfc(st_bbox(c(xmin = 538700, ymin = 180750, xmax = 539800, ymax = 181850), crs = st_crs(thames)))
# a slightly smaller area to use as a border line
bbox_line_s = st_cast(st_as_sfc(st_bbox(c(xmin = 538750, ymin = 180800, xmax = 539750, ymax = 181800), crs = st_crs(thames))),"LINESTRING")

plot(thames$geometry)
plot(bbox_poly, add = TRUE)

# crop the larger thames polygon to the area we want
side2 = st_intersection(thames,bbox_poly)
plot(side2$geometry)


plot(bbox_line_s, add = TRUE)

# estimating midline of the original polygon
m_s1 = midlines_draw(side2, border_line = bbox_line_s)
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
plot(ml1$geometry, add = TRUE, col = "RED")

One option would be to use the dfMaxLength option with midlines_draw like we did on the left side channel.

plot(side2$geometry)
m_s1a = midlines_draw(side2, border_line = bbox_line_s, dfMaxLength = set_units(8,"m"))
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries

# clean this to remove extraneous lines
m_s2a = midlines_clean(m_s1a, border_line = bbox_line_s, n_removed = 5)
plot(m_s2a$geometry[m_s2a$removed_flag==0], add = TRUE, col = "RED")

#plot(ml2$geometry, add = TRUE, col = "RED")

An alternative is to smooth the midline using a rolling average - basically a wrapper round the rollapply function in the excellent Zoo package.

# smooth with a rolling average 
m_s2b = midlines_smooth(m_s1)

plot(side2$geometry)
plot(m_s2b$geometry, add = TRUE, col = "BLUE")

midlines_smooth will smooth lines in a network of midlines but it does not move the point where two or more lines join. The width option can be specified for midlines_smooth, this is passed to rollapply so see documentation there for more details. Beware that the wider the window the greater tenancy to cut the corners and therefore it becomes less like a midline.

And finally

Removing unwanted small bits of midlines

Sometime it might be useful to remove small bits of midline that are not connected the to main midline. For example, in our earlier example

plot(m2$geometry)

# remove the short bits of line
m2 = midlines_debit(m2, length = set_units(500, "m") )

plot(m2$geometry)

De-densifying midlines

In addition to manipulating the density of points on the perimeter of the initial polygon, it is sometimes desirable to manipulate the density of points that form the midline. This results in fewer longer line segments, rather than many shorter line segments. This can be done using midlines_dedensify either before or after cleaning. Like with the midlines_smooth function, the points where lines join are not affected but between line joins the number of points can be reduced. This function calls sf::st_line_sample so see this for the relevant options.

Voronoi polygons

The estimation of midlines is done using Voronoi tessellation via the sf::st_voronoi function. Some understanding of this process can help to understand how the attributes of the polygon will influence it’s estimated midline. For a given set of points, Voronoi tessellation identifies the area around each point, i.e., that are closer to that point than any other. Below is an demonstration using the points in the perimeter of the river Thames polygon as the starting point.

Another use case

An alternative use case is to merge two versions of a road network that are measured differently and therefore do not exactly match. Use buffers to creating a polygon around both networks and then estimate the midline of this polygon to give a combined network against which each can be compared. See this blog post for more details.

Citation

If you use this package in your work it would be helpful if you would cite it. A suggested citation is:

Patterson, R. midlines: Estimate Polygon Midlines. R package version 0.0.0.9002. DOI: 10.17863/CAM.100113

Alternatives

  • The very comprehensive R package cmgo does many of the same things as this package and much more. If your aim to to estimate the midline of a river, you should probably start with cmgo
  • centerline is a Python library with a similar aim

About

Estimate the midline of an sf polygon

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Packages

No packages published

Languages