Skip to content

Commit

Permalink
minor changes
Browse files Browse the repository at this point in the history
  • Loading branch information
ks905383 committed Sep 11, 2024
1 parent dbc9e89 commit 4c78de9
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion joss_paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ bibliography: paper.bib
# Summary
Scientific data (e.g. gridded weather observations, pollution data, night-time lights, or other remote sensing products) are often interpolated to or created on grids or raster pixels to approximate the continuous real world for ease of calculation, standardization, or due to technical limiations. However, the geospatial or administrative boundaries that occur in the real world rarely approximate a grid. For example, birds fly along complex migratory corridors, rain- and watersheds follow valleys and mountains, and many types of data, such as demographics or agricultural information, are often collected on the county, city, or census tract levels. Often, the geospatial and administrative boundaries that occur in the real world can be represented with polygons.

When these raster and polygon worlds collide, as they often do in social or natural science research, data must be aggregated between them (e.g., @auffhammer_using_2013). This aggregation must, however, be done with care to preserve the integrity of the data and subsequent analysis. Consider a researcher working on population and mortality statistics for Los Angeles County. Using gridded temperature data in their work means aggregating the gridded data onto a polygon representing Los Angeles County (\autoref{fig1}). The simplest way to aggregate the data would be to average across every grid cell that overlaps with the county polygon. However, some grid cells may only slightly overlap with the county. The simplest aggregation technique would include these grid cells in the aggregated result with equal weight as grid cells fully within the county, which isn't ideal. Additionally, some grid cells may cover sparsely populated areas of the county, which could be unhelpful to include with equal weight in the aggregated result if studying, for example, the relationship between temperature and society.
When these raster and polygon worlds collide, as they often do in social or natural science research, data must be aggregated between them (e.g., @auffhammer_using_2013). This aggregation must, however, be done with care to preserve the integrity of the data and subsequent analysis. Consider a researcher working on population and mortality statistics for Los Angeles County. Using gridded temperature data in their work may require aggregating the gridded data onto a polygon representing Los Angeles County (\autoref{fig1}). The simplest way to aggregate the data would be to average across every grid cell that overlaps with the county polygon, implicitly weighting each equally. However, some grid cells may only slightly overlap with the county and instead primarily cover areas with different climate characteristics (for example, grid cells primarily covering oceans in \autoref{fig1}); giving them equal weight to grid cells fully inside the county may produce a temperature time series that does not reflect what the county actually experiences. Additionally, some grid cells may cover sparsely populated areas of the county; since few people experience temperature in those areas, including those grid cells with equal weight in the aggregated result may be unhelpful when studying the relationship between temperature and mortality.

![Illustration of `xagg` workflow. Variables stored on a geographic grid (in this case 2-meter daily temperature from ERA5 reanalysis; @hersbach_era5_2020), a set of geographic polygons (in this case US county borders, focusing on Los Angeles County as an example), and an optional second weight on a geographic grid (in this case LandScan Day Population; @rose_landscan_2017) are inputted (panels a., c.). `xagg` calculates the relative overlap between each ERA5 grid cell and each county (panel b.). `xagg` regrids the population grid to the ERA5 grid (panel d.), and produces a set of final grid cell weights composed of both the area overlap and the population density (panel e.). For each county, these weights are used to calculate weighted averages of daily temperature (panel f.), which can be then be outputted in multiple formats for further analysis.\label{fig1}](xagg_joss_figure1.pdf)

Expand Down

0 comments on commit 4c78de9

Please sign in to comment.