-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
f1cec0a
commit b4e0d06
Showing
16 changed files
with
282 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,3 +3,7 @@ | |
^LICENSE\.md$ | ||
^README\.Rmd$ | ||
^cran-comments\.md$ | ||
^\.github$ | ||
^code_deprecated | ||
^data_raw |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
*.html |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples | ||
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help | ||
on: | ||
push: | ||
branches: [main, master] | ||
pull_request: | ||
branches: [main, master] | ||
|
||
name: R-CMD-check | ||
|
||
jobs: | ||
R-CMD-check: | ||
runs-on: ${{ matrix.config.os }} | ||
|
||
name: ${{ matrix.config.os }} (${{ matrix.config.r }}) | ||
|
||
strategy: | ||
fail-fast: false | ||
matrix: | ||
config: | ||
- {os: macos-latest, r: 'release'} | ||
- {os: windows-latest, r: 'release'} | ||
- {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'} | ||
- {os: ubuntu-latest, r: 'release'} | ||
- {os: ubuntu-latest, r: 'oldrel-1'} | ||
|
||
env: | ||
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} | ||
R_KEEP_PKG_SOURCE: yes | ||
|
||
steps: | ||
- uses: actions/checkout@v3 | ||
|
||
- uses: r-lib/actions/setup-pandoc@v2 | ||
|
||
- uses: r-lib/actions/setup-r@v2 | ||
with: | ||
r-version: ${{ matrix.config.r }} | ||
http-user-agent: ${{ matrix.config.http-user-agent }} | ||
use-public-rspm: true | ||
|
||
- uses: r-lib/actions/setup-r-dependencies@v2 | ||
with: | ||
extra-packages: any::rcmdcheck | ||
needs: check | ||
|
||
- uses: r-lib/actions/check-r-package@v2 | ||
with: | ||
upload-snapshots: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,3 +3,4 @@ | |
.Rdata | ||
.httr-oauth | ||
.DS_Store | ||
^code_deprecated |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
|
||
<!-- README.md is generated from README.Rmd. Please edit that file --> | ||
|
||
# perutimber | ||
|
||
<!-- badges: start --> | ||
|
||
[![Lifecycle: | ||
experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) | ||
[![CRAN | ||
status](https://www.r-pkg.org/badges/version/perutimber)](https://CRAN.R-project.org/package=perutimber) | ||
<!-- badges: end --> | ||
|
||
The R package, `perutimber`, provides easy access to taxonomic | ||
information for over 1,300 vascular plant species found in the | ||
“Catalogue of the timber forest species of the Amazon and the Peruvian | ||
Yunga.” This package is based on the authoritative publication by | ||
[Vásquez Martínez and Rojas Gonzáles (2022) titled “Catálogo de las | ||
especies forestales maderables de la Amazonía y la Yunga Peruana” in | ||
Revista Forestal del Perú 37(3, Número Especial): | ||
5-138](https://revistas.lamolina.edu.pe/index.php/rfp/article/view/1956). | ||
With `perutimber`, researchers and enthusiasts alike can efficiently | ||
explore and analyze the fascinating diversity of these plant species. | ||
|
||
## Installation | ||
|
||
You can install the development version of `perutimber` like so: | ||
|
||
``` r | ||
pak::pak("PaulESantos/perutimber") | ||
``` | ||
|
||
## Example |
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,193 @@ | ||
library(here) | ||
library(tidyverse) | ||
library(tabulizer) | ||
|
||
file_path <- "pdf/51277_Artículo_[RFP]_VF.pdf" | ||
|
||
text <- tabulizer::extract_text(file = file_path, | ||
pages = c(17:134), | ||
encoding = "UTF-8") | ||
text <- text |> | ||
stringr::str_split("\n|\r") |> | ||
unlist() | ||
|
||
texto_1 <- text[nchar(text) != 0] | ||
|
||
tbl_texto_1 <- tibble::tibble(especies = texto_1) | ||
|
||
new_df <- tbl_texto_1 |> | ||
mutate(especies = str_squish(especies) |> str_trim()) |> | ||
filter(!str_detect(especies, | ||
paste0(c("^[A-Z]{1,}CEAE", | ||
"[A-Z]{1,}MAS", | ||
"Vol.", | ||
"Catálogo", | ||
"Diciembre 2022", | ||
"Revista Forestal del Perú", | ||
"de la Amazonía y la Yunga Peruana"), | ||
collapse = "|"))) |> | ||
filter(!str_detect(especies, | ||
paste0("^",seq(21, 138, 1), "$", collapse = "|"))) | ||
|
||
new_df | ||
|
||
new_df |> | ||
mutate(especies = str_squish(especies) |> str_trim()) |> | ||
filter(str_detect(especies, | ||
paste0(c("^[A-Z]{1}[a-z]{1,} [a-z]{1,} ", | ||
"^[A-Z]{1}[a-z]{1,} [a-z]{1,}-[a-z]{1,} " | ||
), collapse = "|"))) |> | ||
mutate(id = row_number()) | ||
|
||
|
||
species_list <- readxl::read_xlsx("new_especies_catalogo.xlsx") |> | ||
mutate(new_especie = paste0("paul start ", id, " ", especies)) | ||
|
||
species_list | ||
|
||
|
||
new_df |> | ||
left_join(species_list) |> | ||
mutate(group = if_else( | ||
!is.na(id), | ||
paste0("especie ", id), | ||
as.character(id) | ||
)) |> | ||
fill(group, .direction = "down") |> | ||
mutate(row_id = row_number()) |> | ||
relocate(row_id) | ||
|
||
|
||
# ------------------------------------------------------------------------- | ||
|
||
catalogo <- readxl::read_excel("catalogo_v1.xlsx") | ||
catalogo |> | ||
count(group) | ||
|
||
by_especie <- catalogo |> | ||
group_nest(group) | ||
|
||
by_especie | ||
# funciones --------------------------------------------------------------- | ||
|
||
change_split_word <- function(x){ | ||
split_word <- str_extract_all(x, | ||
"[a-z]{1,}-\\s[a-z]{1,}") | ||
|
||
mgsub::mgsub( | ||
x, | ||
unlist(split_word), | ||
str_replace_all(split_word |> unlist(), "-\\s", "") | ||
) | ||
} | ||
|
||
|
||
change_split_num_interval <- function(x){ | ||
num_interval <- stringr::str_extract_all(x, | ||
"[0-9]{1,}-\\s[0-9]{1,}") | ||
mgsub::mgsub( | ||
x , | ||
unlist(num_interval), | ||
stringr::str_replace_all(num_interval |> unlist(), "-\\s", "-") | ||
) | ||
} | ||
|
||
get_tbl_catl_new <- function(x){ | ||
|
||
get_usos <- function(x){ | ||
if(grepl("Categ. UICN:", x) == FALSE){ | ||
usos <- gsub(".*Usos:", "\\1", x) | ||
|
||
} | ||
else if(grepl("Categ. UICN:", x) == TRUE){ | ||
usos <- gsub(".*Usos: (.+) Categ. UICN:.*", "\\1", x) | ||
} | ||
return(usos) | ||
} | ||
|
||
get_uicn_text <- function(x){ | ||
if(grepl("Categ. UICN:", x) == FALSE){ | ||
uicn <- NA_character_ | ||
|
||
} | ||
else if(grepl("Categ. UICN:", x) == TRUE){ | ||
uicn <- gsub(".*Categ. UICN:", "\\1", x) | ||
} | ||
return(uicn) | ||
} | ||
|
||
#new_x <- change_split_num_interval(x) |> | ||
# change_split_word() | ||
|
||
|
||
return(tibble::tibble( | ||
species = gsub("N\\.Ver\\:.*", "\\1", x), | ||
nombre_comun = gsub(".*N\\.Ver\\: (.+) Hábito:.*", "\\1", x), | ||
habito = gsub(".*Hábito: (.+) Col Ref:.*", "\\1", x), | ||
collecciom = gsub(".*Col Ref: (.+) Región\\(es\\):.*", "\\1", x), | ||
distribucion = gsub(".*Región\\(es\\): (.+) Usos:.*", "\\1", x), | ||
usos = get_usos(x), | ||
categ_uicn = get_uicn_text(x) | ||
) |> | ||
dplyr::mutate_all(~stringr::str_trim(.) |> | ||
stringr::str_squish() )) | ||
|
||
} | ||
|
||
|
||
# ------------------------------------------------------------------------- | ||
|
||
new_catalog <- catalogo |> | ||
group_by(group) |> | ||
summarise(new_text = paste0(especies, collapse = " "), | ||
.groups = "drop") |> | ||
select(-group) | ||
new_catalog_vect <- new_catalog |> pull(new_text) | ||
|
||
clean_df <- map_dfr(new_catalog_vect, | ||
~get_tbl_catl_new(.)) | ||
|
||
#' > clean_df |> names() | ||
#' [1] "species" "nombre_comun" "habito" "collecciom" "distribucion" "usos" | ||
#' [7] "categ_uicn" | ||
|
||
clean_df1 <- clean_df |> | ||
mutate(species = change_split_word(species)) |> | ||
mutate(nombre_comun = change_split_word(nombre_comun)) |> | ||
mutate(collecciom = change_split_word(collecciom)) |> | ||
mutate(distribucion = change_split_word(distribucion)) |> | ||
mutate(distribucion = change_split_num_interval(distribucion)) |> | ||
mutate(usos = change_split_word(usos)) | ||
|
||
|
||
clean_df2 <- clean_df1 |> | ||
mutate(tag_subsp = case_when( | ||
str_detect(species, "subsp\\.") ~ "subsp.", | ||
str_detect(species, "var\\.") ~ "var.", | ||
TRUE ~ NA_character_ | ||
)) |> | ||
mutate(genus_ephitethon = stringr::word(species, 1, 1), | ||
species_ephitethon = stringr::word(species, 2, 2), | ||
subspecies_ephitethon = case_when( | ||
tag_subsp == "var." ~ stringr::str_extract(species, "(?<=var\\. )(\\w+)"), | ||
tag_subsp == "subsp." ~ stringr::str_extract(species, "(?<=subsp\\. )(\\w+)"), | ||
TRUE ~ NA_character_ | ||
)) |> | ||
mutate(tipo = stringr.plus::str_extract_before(habito, "\\s[0-9]{1,}"), | ||
altura = stringr.plus::str_extract_after(habito, "[a-z]{1,}\\s")) |> | ||
mutate(colector = stringr.plus::str_extract_before(collecciom, | ||
"\\s[0-9]{1,}"), | ||
id_coleccion = stringr::str_extract(collecciom, "[0-9]{1,}"), | ||
deposito = stringr::str_extract(collecciom, "\\([^()]+\\)") | ||
) |> | ||
mutate(regiones = stringr.plus::str_extract_before(distribucion, | ||
"\\sAlt\\(m\\.\\):"), | ||
elevacion = stringr.plus::str_extract_after(distribucion, | ||
"\\sAlt\\(m\\.\\): ")) |> | ||
mutate(usos_2 = gsub("\\[.*?\\]", "", usos), | ||
endemismo = if_else(str_detect(usos, "Dist\\. Endémico\\."), | ||
"Endémico", | ||
NA_character_), | ||
categ_uicn = tm::removePunctuation(categ_uicn)) | ||
clean_df2 |> | ||
writexl::write_xlsx("catalogo_ver_2.xlsx") |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.