The aim of this project is to show that the atmospheric CO2 concentration as a result of human-related activities is causing the average global temperature to rise. We have found evidence, from available data, to support this fact by showing that a correlation exists between temperature anomaly and CO2 concentration and that no such correlations exist with other climate-related measurements.
Once this evidence is established, our focus is turned to a predictive model of the temperature anomaly and CO2 concentration. We develop a simple model based on differential equations, and calculate the best-fitting parameters using a least-squares optimization algorithm on the sum of squared residuals between the model and the data. Finally, a Bayesian statistical approach is used to fit the model to the data and obtain appropriate 95% credible intervals.
data contains all data used throughout the project.
downloadscontains the original data before processing.processedcontains the processed data.imagescontains any generated images.
scripts contains all code written throughout the project.
preprocessing.ipynbcontains the code for pre-processing the downloaded data.cross_correlations.Rperforms the cross-correlation analysis.temperature_map.Rcreates various temperature maps.RStanODEModel.Randode.stanperforms Bayesian temperature modelling.main.ipynbmain Python script collating the analyses.
All of the data used is freely available.
- Temperature data was accessed from the NASA GISSTEMP v4 dataset, which consists of monthly anomaly estimates on a 2°×2° grid from 1880 to present.
- CO2 data was accessed from the NOAA GML dataset, which consists of monthly atmospheric CO2 concentrations at the Mauna Loa Observatory in Hawaii from 1958 to present.
- Volcanic activity data was accessed from the Global Volcanism Program dataset, which details all recorded eruptions in recent history.
- Solar irradiance data was accessed from the NOAA CDR dataset, which contains yearly averaged solar irradiance values from 1880 to present.
This data is preprocessed before analysis.
The Python programming language (version 3.11) is used for most of the analysis in this project, with the R programming language (version 4.3) and Stan probabilistic programming language (version 2.26.1) being used for certain tasks.
-
Jupyter notebooks are used for all Python code. The easiest way to install Jupyter notebooks is through the Anaconda platform.
-
The standard Python tools for statistical data analysis are used. These include the
pandas,numpyandscipypackages. The preprocessing of the data requires thexarrayandnetCDF4Python packages. Python packages can be installed usingconda install package_nameif using Python through the Anaconda platform, or usingpip install package_nameotherwise. -
The
ggplot2,ggmap,testcorr,rstanandHDIntervalpackages are required for running the R and Stan scripts. R packages can be installed usinginstall.packages("package_name").
The recommended usage is as follows:
- Download the contents of
dataandscripts. - Install all necessary software and packages.
- Run
preprocessing.ipynbto prepare the downloaded data for analysis. - Run
cross_correlations.Rto perform the correlation analysis. - Run
RStanODEModel.Rto perform the Bayesian temperature modelling. - Run
temperature_map.Rto generate useful temperature plots. - Run
main.ipynbto view the analyses and relevant plots together.
Adam Watt
Seán O'Neill
