Skip to content

Latest commit

 

History

History
158 lines (125 loc) · 4.7 KB

README.md

File metadata and controls

158 lines (125 loc) · 4.7 KB

{cranology}: The CRAN Chronology

Lifecycle: experimental R-CMD-check

The goal of {cranology} is to provide tools to scrape data from CRAN and PPM websites as well as useful datasets to explore the evolution of the number of packages on CRAN.

cranology::plot_cran_monthly_package_number()

Installation

You can install the development version of {cranology} with:

remotes::install_github("ThinkR-open/cranology")

Examples

library(cranology)

Datasets

All packages ever available on CRAN.

cran_packages_history
#> # A tibble: 44,613 × 10
#>    file_name    date                time  size  package_name last_archived      
#>    <chr>        <dttm>              <chr> <chr> <chr>        <dttm>             
#>  1 A3/          2015-08-16 21:05:00 21:05 -     A3           2015-08-16 21:05:00
#>  2 aaMI/        2010-07-30 12:17:00 12:17 -     aaMI         2010-07-30 12:17:00
#>  3 aaSEA/       2022-06-21 05:12:00 05:12 -     aaSEA        2022-06-21 05:12:00
#>  4 AATtools/    2024-08-16 09:10:00 09:10 -     AATtools     2024-08-16 09:10:00
#>  5 aba/         2022-03-27 06:29:00 06:29 -     aba          2022-03-27 06:29:00
#>  6 abbyyR/      2023-11-03 04:42:00 04:42 -     abbyyR       2023-11-03 04:42:00
#>  7 abc.data/    2024-03-24 10:15:00 10:15 -     abc.data     2024-03-24 10:15:00
#>  8 abc/         2022-05-19 07:20:00 07:20 -     abc          2022-05-19 07:20:00
#>  9 abcADM/      2023-03-02 11:13:00 11:13 -     abcADM       2023-03-02 11:13:00
#> 10 ABCanalysis/ 2017-03-13 13:31:00 13:31 -     ABCanalysis  2017-03-13 13:31:00
#> # ℹ 44,603 more rows
#> # ℹ 4 more variables: archive <lgl>, first_date <dttm>, n_versions <int>,
#> #   last_modified <dttm>

The evolution of the number of packages on CRAN since its beginning.

cran_monthly_package_number
#> # A tibble: 324 × 2
#>    date       number_packages
#>    <date>               <dbl>
#>  1 1997-10-08               1
#>  2 1997-11-08               1
#>  3 1997-12-08               1
#>  4 1998-01-08               2
#>  5 1998-02-08               2
#>  6 1998-03-08               3
#>  7 1998-04-08               5
#>  8 1998-05-08               6
#>  9 1998-06-08               6
#> 10 1998-07-08               7
#> # ℹ 314 more rows

Scraping tools

CRAN

Both cran_packages_history and cran_monthly_package_number datasets are generated by the function scrape_cran(). The scraping process is quite time consuming and relies on the {furrr} package to scrape the CRAN pages asynchronously.

future::plan(future::multisession)
scrape_cran_history()

PPM

{cranology} also includes the get_package_number_ppm() function to more quickly get the number of packages that were available on CRAN at any given date.

dates <- seq(
  from = as.Date("2018-04-10", "%Y-%m-%d"), 
  by = "1 year", 
  length.out = 4
)
get_package_number_ppm(dates)
#> Scraping ppm...
#> Scraping number packages on: 2018-04-10
#> Scraping number packages on: 2019-04-10
#> Scraping number packages on: 2020-04-10
#> Scraping number packages on: 2021-04-10
#>         date number_packages
#> 1 2018-04-10           12415
#> 2 2019-04-10           14025
#> 3 2020-04-10           15548
#> 4 2021-04-10           17388

Be careful though as this will only work for dates posterior to 2014-09-17 the day when PPM was up online for the first time.

get_package_number_ppm("2013-08-28")
#> Error: Some dates are anterior to ppm launch:
#> 1: 2013-08-28

For earlier dates use cran_monthly_package_number. Here is a naïve example:

date_before_ppm <- as.Date("2013-08-28")

cran_monthly_package_number[
  min(
    which(
      cran_monthly_package_number$date >= date_before_ppm
    )
  ), 
]
#> # A tibble: 1 × 2
#>   date       number_packages
#>   <date>               <dbl>
#> 1 2013-09-08            4904

Acknowledgements

The scrape_cran() function is essentially a tidyversification of this github gist written by @daroczig.

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.