readme

PaulESantos · Mar 21, 2023 · b4e0d06 · b4e0d06
1 parent f1cec0a
commit b4e0d06
Show file tree

Hide file tree

Showing 16 changed files with 282 additions and 24 deletions.
diff --git a/.Rbuildignore b/.Rbuildignore
@@ -3,3 +3,7 @@
 ^LICENSE\.md$
 ^README\.Rmd$
 ^cran-comments\.md$
+^\.github$
+^code_deprecated
+^pdf
+^data_raw
diff --git a/.github/.gitignore b/.github/.gitignore
@@ -0,0 +1 @@
+*.html
diff --git a/.github/workflows/R-CMD-check.yaml b/.github/workflows/R-CMD-check.yaml
@@ -0,0 +1,49 @@
+# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
+# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
+on:
+  push:
+    branches: [main, master]
+  pull_request:
+    branches: [main, master]
+
+name: R-CMD-check
+
+jobs:
+  R-CMD-check:
+    runs-on: ${{ matrix.config.os }}
+
+    name: ${{ matrix.config.os }} (${{ matrix.config.r }})
+
+    strategy:
+      fail-fast: false
+      matrix:
+        config:
+          - {os: macos-latest,   r: 'release'}
+          - {os: windows-latest, r: 'release'}
+          - {os: ubuntu-latest,   r: 'devel', http-user-agent: 'release'}
+          - {os: ubuntu-latest,   r: 'release'}
+          - {os: ubuntu-latest,   r: 'oldrel-1'}
+
+    env:
+      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
+      R_KEEP_PKG_SOURCE: yes
+
+    steps:
+      - uses: actions/checkout@v3
+
+      - uses: r-lib/actions/setup-pandoc@v2
+
+      - uses: r-lib/actions/setup-r@v2
+        with:
+          r-version: ${{ matrix.config.r }}
+          http-user-agent: ${{ matrix.config.http-user-agent }}
+          use-public-rspm: true
+
+      - uses: r-lib/actions/setup-r-dependencies@v2
+        with:
+          extra-packages: any::rcmdcheck
+          needs: check
+
+      - uses: r-lib/actions/check-r-package@v2
+        with:
+          upload-snapshots: true
diff --git a/.gitignore b/.gitignore
@@ -3,3 +3,4 @@
 .Rdata
 .httr-oauth
 .DS_Store
+^code_deprecated
diff --git a/README.Rmd b/README.Rmd
@@ -20,7 +20,7 @@ knitr::opts_chunk$set(
 [![CRAN status](https://www.r-pkg.org/badges/version/perutimber)](https://CRAN.R-project.org/package=perutimber)
 <!-- badges: end -->
 
-The R package, `perutimber`, provides easy access to taxonomic information for over 1,300 vascular plant species found in the "Catalogue of the timber forest species of the Amazon and the Peruvian Yunga." This package is based on the authoritative publication by Vásquez Martínez and Rojas Gonzáles (2022) titled "Catálogo de las especies forestales maderables de la Amazonía y la Yunga Peruana" in Revista Forestal del Perú 37(3, Número Especial): 5-138. With `perutimber`, researchers and enthusiasts alike can efficiently explore and analyze the fascinating diversity of these plant species.
+The R package, `perutimber`, provides easy access to taxonomic information for over 1,300 vascular plant species found in the "Catalogue of the timber forest species of the Amazon and the Peruvian Yunga." This package is based on the authoritative publication by [Vásquez Martínez and Rojas Gonzáles (2022) titled "Catálogo de las especies forestales maderables de la Amazonía y la Yunga Peruana" in Revista Forestal del Perú 37(3, Número Especial): 5-138](https://revistas.lamolina.edu.pe/index.php/rfp/article/view/1956). With `perutimber`, researchers and enthusiasts alike can efficiently explore and analyze the fascinating diversity of these plant species.
 
 ## Installation
 
@@ -31,26 +31,3 @@ pak::pak("PaulESantos/perutimber")
 ```
 
 ## Example
-
-This is a basic example which shows you how to solve a common problem:
-
-```{r example}
-library(perutimber)
-## basic example code
-```
-
-What is special about using `README.Rmd` instead of just `README.md`? You can include R chunks like so:
-
-```{r cars}
-summary(cars)
-```
-
-You'll still need to render `README.Rmd` regularly, to keep `README.md` up-to-date. `devtools::build_readme()` is handy for this. You could also use GitHub Actions to re-render `README.Rmd` every time you push. An example workflow can be found here: <https://github.com/r-lib/actions/tree/v1/examples>.
-
-You can also embed plots, for example:
-
-```{r pressure, echo = FALSE}
-plot(pressure)
-```
-
-In that case, don't forget to commit and push the resulting figure files, so they display on GitHub and CRAN.
diff --git a/README.md b/README.md
@@ -0,0 +1,33 @@
+
+<!-- README.md is generated from README.Rmd. Please edit that file -->
+
+# perutimber
+
+<!-- badges: start -->
+
+[![Lifecycle:
+experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
+[![CRAN
+status](https://www.r-pkg.org/badges/version/perutimber)](https://CRAN.R-project.org/package=perutimber)
+<!-- badges: end -->
+
+The R package, `perutimber`, provides easy access to taxonomic
+information for over 1,300 vascular plant species found in the
+“Catalogue of the timber forest species of the Amazon and the Peruvian
+Yunga.” This package is based on the authoritative publication by
+[Vásquez Martínez and Rojas Gonzáles (2022) titled “Catálogo de las
+especies forestales maderables de la Amazonía y la Yunga Peruana” in
+Revista Forestal del Perú 37(3, Número Especial):
+5-138](https://revistas.lamolina.edu.pe/index.php/rfp/article/view/1956).
+With `perutimber`, researchers and enthusiasts alike can efficiently
+explore and analyze the fascinating diversity of these plant species.
+
+## Installation
+
+You can install the development version of `perutimber` like so:
+
+``` r
+pak::pak("PaulESantos/perutimber")
+```
+
+## Example
diff --git a/catalogo.xlsx b/catalogo.xlsx
diff --git a/catalogo_v1.xlsx b/catalogo_v1.xlsx
diff --git a/catalogo_ver_2.xlsx b/catalogo_ver_2.xlsx
diff --git a/code_deprecated/data_preparation_code.R b/code_deprecated/data_preparation_code.R
@@ -0,0 +1,193 @@
+library(here)
+library(tidyverse)
+library(tabulizer)
+
+file_path <- "pdf/51277_Artículo_[RFP]_VF.pdf"
+
+text <- tabulizer::extract_text(file = file_path,
+                                pages = c(17:134),
+                                encoding = "UTF-8")
+text <- text |>
+  stringr::str_split("\n|\r") |>
+  unlist()
+
+texto_1 <- text[nchar(text) != 0]
+
+tbl_texto_1 <- tibble::tibble(especies = texto_1)
+
+new_df <- tbl_texto_1 |>
+  mutate(especies = str_squish(especies) |> str_trim()) |>
+  filter(!str_detect(especies,
+                     paste0(c("^[A-Z]{1,}CEAE",
+                              "[A-Z]{1,}MAS",
+                              "Vol.",
+                              "Catálogo",
+                              "Diciembre 2022",
+                              "Revista Forestal del Perú",
+                              "de la Amazonía y la Yunga Peruana"),
+                            collapse = "|"))) |>
+  filter(!str_detect(especies,
+                     paste0("^",seq(21, 138, 1), "$", collapse = "|")))
+
+new_df
+
+new_df |>
+  mutate(especies = str_squish(especies) |> str_trim()) |>
+  filter(str_detect(especies,
+                    paste0(c("^[A-Z]{1}[a-z]{1,} [a-z]{1,} ",
+                             "^[A-Z]{1}[a-z]{1,} [a-z]{1,}-[a-z]{1,} "
+                             ), collapse = "|"))) |>
+  mutate(id = row_number())
+
+
+species_list <- readxl::read_xlsx("new_especies_catalogo.xlsx") |>
+  mutate(new_especie = paste0("paul start ", id, " ", especies))
+
+species_list
+
+
+new_df |>
+  left_join(species_list) |>
+  mutate(group = if_else(
+    !is.na(id),
+    paste0("especie ", id),
+    as.character(id)
+    )) |>
+  fill(group, .direction = "down") |>
+  mutate(row_id = row_number()) |>
+  relocate(row_id)
+
+
+# -------------------------------------------------------------------------
+
+catalogo <- readxl::read_excel("catalogo_v1.xlsx")
+catalogo |>
+  count(group)
+
+by_especie <- catalogo  |>
+  group_nest(group)
+
+by_especie
+# funciones ---------------------------------------------------------------
+
+change_split_word <- function(x){
+  split_word <- str_extract_all(x,
+                                "[a-z]{1,}-\\s[a-z]{1,}")
+
+  mgsub::mgsub(
+    x,
+    unlist(split_word),
+    str_replace_all(split_word |>  unlist(), "-\\s", "")
+               )
+}
+
+
+change_split_num_interval <- function(x){
+  num_interval <- stringr::str_extract_all(x,
+                              "[0-9]{1,}-\\s[0-9]{1,}")
+  mgsub::mgsub(
+    x ,
+    unlist(num_interval),
+    stringr::str_replace_all(num_interval |>  unlist(), "-\\s", "-")
+  )
+}
+
+get_tbl_catl_new <- function(x){
+
+  get_usos <- function(x){
+    if(grepl("Categ. UICN:", x) == FALSE){
+      usos <- gsub(".*Usos:", "\\1", x)
+
+    }
+    else if(grepl("Categ. UICN:", x) == TRUE){
+      usos <- gsub(".*Usos: (.+) Categ. UICN:.*", "\\1", x)
+    }
+    return(usos)
+  }
+
+  get_uicn_text <- function(x){
+    if(grepl("Categ. UICN:", x) == FALSE){
+      uicn <- NA_character_
+
+    }
+    else if(grepl("Categ. UICN:", x) == TRUE){
+      uicn <- gsub(".*Categ. UICN:", "\\1", x)
+    }
+    return(uicn)
+  }
+
+  #new_x <- change_split_num_interval(x) |>
+  # change_split_word()
+
+
+  return(tibble::tibble(
+    species =  gsub("N\\.Ver\\:.*", "\\1", x),
+    nombre_comun =  gsub(".*N\\.Ver\\: (.+) Hábito:.*", "\\1", x),
+    habito = gsub(".*Hábito: (.+) Col Ref:.*", "\\1", x),
+    collecciom = gsub(".*Col Ref: (.+) Región\\(es\\):.*", "\\1", x),
+    distribucion = gsub(".*Región\\(es\\): (.+) Usos:.*", "\\1", x),
+    usos = get_usos(x),
+    categ_uicn = get_uicn_text(x)
+  ) |>
+    dplyr::mutate_all(~stringr::str_trim(.) |>
+                        stringr::str_squish() ))
+
+}
+
+
+# -------------------------------------------------------------------------
+
+new_catalog <- catalogo |>
+  group_by(group) |>
+  summarise(new_text = paste0(especies, collapse = " "),
+            .groups = "drop") |>
+  select(-group)
+new_catalog_vect <- new_catalog |>  pull(new_text)
+
+clean_df <- map_dfr(new_catalog_vect,
+                    ~get_tbl_catl_new(.))
+
+#' > clean_df |> names()
+#' [1] "species"  "nombre_comun" "habito" "collecciom"   "distribucion" "usos"
+#' [7] "categ_uicn"
+
+clean_df1 <- clean_df |>
+  mutate(species = change_split_word(species)) |>
+  mutate(nombre_comun = change_split_word(nombre_comun)) |>
+  mutate(collecciom = change_split_word(collecciom)) |>
+  mutate(distribucion = change_split_word(distribucion)) |>
+  mutate(distribucion = change_split_num_interval(distribucion)) |>
+  mutate(usos = change_split_word(usos))
+
+
+clean_df2 <- clean_df1 |>
+  mutate(tag_subsp = case_when(
+    str_detect(species, "subsp\\.") ~ "subsp.",
+    str_detect(species, "var\\.") ~ "var.",
+    TRUE ~ NA_character_
+  )) |>
+  mutate(genus_ephitethon = stringr::word(species, 1, 1),
+         species_ephitethon = stringr::word(species, 2, 2),
+         subspecies_ephitethon = case_when(
+         tag_subsp == "var." ~ stringr::str_extract(species, "(?<=var\\. )(\\w+)"),
+         tag_subsp == "subsp." ~ stringr::str_extract(species, "(?<=subsp\\. )(\\w+)"),
+         TRUE ~ NA_character_
+         )) |>
+  mutate(tipo = stringr.plus::str_extract_before(habito, "\\s[0-9]{1,}"),
+         altura = stringr.plus::str_extract_after(habito, "[a-z]{1,}\\s")) |>
+  mutate(colector =  stringr.plus::str_extract_before(collecciom,
+                                                      "\\s[0-9]{1,}"),
+         id_coleccion = stringr::str_extract(collecciom, "[0-9]{1,}"),
+         deposito = stringr::str_extract(collecciom, "\\([^()]+\\)")
+         ) |>
+  mutate(regiones = stringr.plus::str_extract_before(distribucion,
+                                                     "\\sAlt\\(m\\.\\):"),
+         elevacion = stringr.plus::str_extract_after(distribucion,
+                                                     "\\sAlt\\(m\\.\\): ")) |>
+  mutate(usos_2 = gsub("\\[.*?\\]", "", usos),
+         endemismo = if_else(str_detect(usos, "Dist\\. Endémico\\."),
+                             "Endémico",
+                             NA_character_),
+         categ_uicn = tm::removePunctuation(categ_uicn))
+clean_df2 |>
+  writexl::write_xlsx("catalogo_ver_2.xlsx")
diff --git a/data_raw/peru_timber_data.rda b/data_raw/peru_timber_data.rda
diff --git a/data_raw/tnrs_data_perutimber.rda b/data_raw/tnrs_data_perutimber.rda
diff --git a/especies_cat.xlsx b/especies_cat.xlsx
diff --git a/new_especies_catalogo.xlsx b/new_especies_catalogo.xlsx
diff --git a/species_text_list.rda b/species_text_list.rda
diff --git a/~$especies_cat.xlsx b/~$especies_cat.xlsx