For this assignment, I have used dataset obtained from the website Living Planet Index.
The Living Planet is a comprehensive study of trends in global biodiversity and the health of the planet based on population trends of vertebrate species from terrestrial, freshwater and marine habitats.The period covered by this report is from 1950 to 2016 and has data for over 2805 unique species belonging to 11 different classes. In addition to the population data, each time series is assigned to a system–terrestrial, freshwater and marine–based on both the location of the monitored population and the habitat the species mostly relies on. Analysis has been done for the complete dataset with focus on the African region.
Below is the list of variable available in this dataset.
- ID: Unique identifier
- Binomial: Binomial Nomenclature i.e. the formal naming system of living things used by scientists
- Reference: Reference for the binomial
- Class: Contains 8 different Animal Kingdom classes
- Order: Taxonomic rank used for classification of organisms and recognized by the nomenclature codes
- Family: Taxonomic rank, classified between genus and order
- Genus: Taxonomic rank for bilogical classification, comes above species and below family
- Species: Basic unit of classification and a taxonomic rank of an organism, as well as a unit of biodiversity
- Subspecies: a taxonomic category that ranks below species, usually a fairly permanent geographically isolated race
- Common_name: name commonly used by general public to refer to a particular species
- Location: exact location of the species
- Country: Country where the species is found
- All_countries: List of countries where the species is found
- Region: Region where the species is found
- Latitude: Latitude of the location
- Longitude: Longitude of the location
- Specific_location: value is 0 or 1
- temperate_or_tropical: temperate has value 1 and tropical has value 2
- System: provides information if the system is terrestial,marine or freshwater
- T_realm: terrestial realm data
- T_biome: terrestial biome data
- FW_realm: freshwater realm data
- FW_biome: freshwater biome data
- M_realm: marine realm data
- M_ocean: marine ocean data
- M_biome: marine biome data
- Units: units of measurement of population
- Method: method of sampling
29-95. 1950-2016: Years for which data is available
There are 3 Tidy Principles.
- Each variable must have its own column.
- Each observation must have its own row.
- Each value must have its own cell.
This dataset breaks the second one - Within the table, each year is a separate column. Since each observation must have its own row, I consolidated them into a single year column using "Gather" from tidyr package. I also used dplyr package to do some data manipulation. For plotting I have used packages ggplot, ggthemes,rworldmap