Skip to content

ThinkR-open/cranology

Repository files navigation

{cranology}: The CRAN Chronology

Lifecycle: experimental R-CMD-check

The goal of {cranology} is to provide tools to scrape data from CRAN and PPM websites as well as useful datasets to explore the evolution of the number of packages on CRAN.

cranology::plot_cran_monthly_package_number()

Installation

You can install the development version of {cranology} with:

remotes::install_github("ThinkR-open/cranology")

Examples

library(cranology)

Datasets

All packages ever available on CRAN.

cran_packages_history
#> # A tibble: 44,613 × 10
#>    file_name    date                time  size  package_name last_archived      
#>    <chr>        <dttm>              <chr> <chr> <chr>        <dttm>             
#>  1 A3/          2015-08-16 21:05:00 21:05 -     A3           2015-08-16 21:05:00
#>  2 aaMI/        2010-07-30 12:17:00 12:17 -     aaMI         2010-07-30 12:17:00
#>  3 aaSEA/       2022-06-21 05:12:00 05:12 -     aaSEA        2022-06-21 05:12:00
#>  4 AATtools/    2024-08-16 09:10:00 09:10 -     AATtools     2024-08-16 09:10:00
#>  5 aba/         2022-03-27 06:29:00 06:29 -     aba          2022-03-27 06:29:00
#>  6 abbyyR/      2023-11-03 04:42:00 04:42 -     abbyyR       2023-11-03 04:42:00
#>  7 abc.data/    2024-03-24 10:15:00 10:15 -     abc.data     2024-03-24 10:15:00
#>  8 abc/         2022-05-19 07:20:00 07:20 -     abc          2022-05-19 07:20:00
#>  9 abcADM/      2023-03-02 11:13:00 11:13 -     abcADM       2023-03-02 11:13:00
#> 10 ABCanalysis/ 2017-03-13 13:31:00 13:31 -     ABCanalysis  2017-03-13 13:31:00
#> # ℹ 44,603 more rows
#> # ℹ 4 more variables: archive <lgl>, first_date <dttm>, n_versions <int>,
#> #   last_modified <dttm>

The evolution of the number of packages on CRAN since its beginning.

cran_monthly_package_number
#> # A tibble: 324 × 2
#>    date       number_packages
#>    <date>               <dbl>
#>  1 1997-10-08               1
#>  2 1997-11-08               1
#>  3 1997-12-08               1
#>  4 1998-01-08               2
#>  5 1998-02-08               2
#>  6 1998-03-08               3
#>  7 1998-04-08               5
#>  8 1998-05-08               6
#>  9 1998-06-08               6
#> 10 1998-07-08               7
#> # ℹ 314 more rows

Scraping tools

CRAN

Both cran_packages_history and cran_monthly_package_number datasets are generated by the function scrape_cran(). The scraping process is quite time consuming and relies on the {furrr} package to scrape the CRAN pages asynchronously.

future::plan(future::multisession)
scrape_cran_history()

PPM

{cranology} also includes the get_package_number_ppm() function to more quickly get the number of packages that were available on CRAN at any given date.

dates <- seq(
  from = as.Date("2018-04-10", "%Y-%m-%d"), 
  by = "1 year", 
  length.out = 4
)
get_package_number_ppm(dates)
#> Scraping ppm...
#> Scraping number packages on: 2018-04-10
#> Scraping number packages on: 2019-04-10
#> Scraping number packages on: 2020-04-10
#> Scraping number packages on: 2021-04-10
#>         date number_packages
#> 1 2018-04-10           12415
#> 2 2019-04-10           14025
#> 3 2020-04-10           15548
#> 4 2021-04-10           17388

Be careful though as this will only work for dates posterior to 2014-09-17 the day when PPM was up online for the first time.

get_package_number_ppm("2013-08-28")
#> Error: Some dates are anterior to ppm launch:
#> 1: 2013-08-28

For earlier dates use cran_monthly_package_number. Here is a naïve example:

date_before_ppm <- as.Date("2013-08-28")

cran_monthly_package_number[
  min(
    which(
      cran_monthly_package_number$date >= date_before_ppm
    )
  ), 
]
#> # A tibble: 1 × 2
#>   date       number_packages
#>   <date>               <dbl>
#> 1 2013-09-08            4904

Acknowledgements

The scrape_cran() function is essentially a tidyversification of this github gist written by @daroczig.

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

About

No description, website, or topics provided.

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages