Skip to content

Commit

Permalink
readme
Browse files Browse the repository at this point in the history
  • Loading branch information
PaulESantos committed Mar 21, 2023
1 parent f1cec0a commit b4e0d06
Show file tree
Hide file tree
Showing 16 changed files with 282 additions and 24 deletions.
4 changes: 4 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,7 @@
^LICENSE\.md$
^README\.Rmd$
^cran-comments\.md$
^\.github$
^code_deprecated
^pdf
^data_raw
1 change: 1 addition & 0 deletions .github/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.html
49 changes: 49 additions & 0 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches: [main, master]
pull_request:
branches: [main, master]

name: R-CMD-check

jobs:
R-CMD-check:
runs-on: ${{ matrix.config.os }}

name: ${{ matrix.config.os }} (${{ matrix.config.r }})

strategy:
fail-fast: false
matrix:
config:
- {os: macos-latest, r: 'release'}
- {os: windows-latest, r: 'release'}
- {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}
- {os: ubuntu-latest, r: 'release'}
- {os: ubuntu-latest, r: 'oldrel-1'}

env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
R_KEEP_PKG_SOURCE: yes

steps:
- uses: actions/checkout@v3

- uses: r-lib/actions/setup-pandoc@v2

- uses: r-lib/actions/setup-r@v2
with:
r-version: ${{ matrix.config.r }}
http-user-agent: ${{ matrix.config.http-user-agent }}
use-public-rspm: true

- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::rcmdcheck
needs: check

- uses: r-lib/actions/check-r-package@v2
with:
upload-snapshots: true
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
.Rdata
.httr-oauth
.DS_Store
^code_deprecated
25 changes: 1 addition & 24 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ knitr::opts_chunk$set(
[![CRAN status](https://www.r-pkg.org/badges/version/perutimber)](https://CRAN.R-project.org/package=perutimber)
<!-- badges: end -->

The R package, `perutimber`, provides easy access to taxonomic information for over 1,300 vascular plant species found in the "Catalogue of the timber forest species of the Amazon and the Peruvian Yunga." This package is based on the authoritative publication by Vásquez Martínez and Rojas Gonzáles (2022) titled "Catálogo de las especies forestales maderables de la Amazonía y la Yunga Peruana" in Revista Forestal del Perú 37(3, Número Especial): 5-138. With `perutimber`, researchers and enthusiasts alike can efficiently explore and analyze the fascinating diversity of these plant species.
The R package, `perutimber`, provides easy access to taxonomic information for over 1,300 vascular plant species found in the "Catalogue of the timber forest species of the Amazon and the Peruvian Yunga." This package is based on the authoritative publication by [Vásquez Martínez and Rojas Gonzáles (2022) titled "Catálogo de las especies forestales maderables de la Amazonía y la Yunga Peruana" in Revista Forestal del Perú 37(3, Número Especial): 5-138](https://revistas.lamolina.edu.pe/index.php/rfp/article/view/1956). With `perutimber`, researchers and enthusiasts alike can efficiently explore and analyze the fascinating diversity of these plant species.

## Installation

Expand All @@ -31,26 +31,3 @@ pak::pak("PaulESantos/perutimber")
```

## Example

This is a basic example which shows you how to solve a common problem:

```{r example}
library(perutimber)
## basic example code
```

What is special about using `README.Rmd` instead of just `README.md`? You can include R chunks like so:

```{r cars}
summary(cars)
```

You'll still need to render `README.Rmd` regularly, to keep `README.md` up-to-date. `devtools::build_readme()` is handy for this. You could also use GitHub Actions to re-render `README.Rmd` every time you push. An example workflow can be found here: <https://github.com/r-lib/actions/tree/v1/examples>.

You can also embed plots, for example:

```{r pressure, echo = FALSE}
plot(pressure)
```

In that case, don't forget to commit and push the resulting figure files, so they display on GitHub and CRAN.
33 changes: 33 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@

<!-- README.md is generated from README.Rmd. Please edit that file -->

# perutimber

<!-- badges: start -->

[![Lifecycle:
experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![CRAN
status](https://www.r-pkg.org/badges/version/perutimber)](https://CRAN.R-project.org/package=perutimber)
<!-- badges: end -->

The R package, `perutimber`, provides easy access to taxonomic
information for over 1,300 vascular plant species found in the
“Catalogue of the timber forest species of the Amazon and the Peruvian
Yunga.” This package is based on the authoritative publication by
[Vásquez Martínez and Rojas Gonzáles (2022) titled “Catálogo de las
especies forestales maderables de la Amazonía y la Yunga Peruana” in
Revista Forestal del Perú 37(3, Número Especial):
5-138](https://revistas.lamolina.edu.pe/index.php/rfp/article/view/1956).
With `perutimber`, researchers and enthusiasts alike can efficiently
explore and analyze the fascinating diversity of these plant species.

## Installation

You can install the development version of `perutimber` like so:

``` r
pak::pak("PaulESantos/perutimber")
```

## Example
Binary file removed catalogo.xlsx
Binary file not shown.
Binary file removed catalogo_v1.xlsx
Binary file not shown.
Binary file removed catalogo_ver_2.xlsx
Binary file not shown.
193 changes: 193 additions & 0 deletions code_deprecated/data_preparation_code.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
library(here)
library(tidyverse)
library(tabulizer)

file_path <- "pdf/51277_Artículo_[RFP]_VF.pdf"

text <- tabulizer::extract_text(file = file_path,
pages = c(17:134),
encoding = "UTF-8")
text <- text |>
stringr::str_split("\n|\r") |>
unlist()

texto_1 <- text[nchar(text) != 0]

tbl_texto_1 <- tibble::tibble(especies = texto_1)

new_df <- tbl_texto_1 |>
mutate(especies = str_squish(especies) |> str_trim()) |>
filter(!str_detect(especies,
paste0(c("^[A-Z]{1,}CEAE",
"[A-Z]{1,}MAS",
"Vol.",
"Catálogo",
"Diciembre 2022",
"Revista Forestal del Perú",
"de la Amazonía y la Yunga Peruana"),
collapse = "|"))) |>
filter(!str_detect(especies,
paste0("^",seq(21, 138, 1), "$", collapse = "|")))

new_df

new_df |>
mutate(especies = str_squish(especies) |> str_trim()) |>
filter(str_detect(especies,
paste0(c("^[A-Z]{1}[a-z]{1,} [a-z]{1,} ",
"^[A-Z]{1}[a-z]{1,} [a-z]{1,}-[a-z]{1,} "
), collapse = "|"))) |>
mutate(id = row_number())


species_list <- readxl::read_xlsx("new_especies_catalogo.xlsx") |>
mutate(new_especie = paste0("paul start ", id, " ", especies))

species_list


new_df |>
left_join(species_list) |>
mutate(group = if_else(
!is.na(id),
paste0("especie ", id),
as.character(id)
)) |>
fill(group, .direction = "down") |>
mutate(row_id = row_number()) |>
relocate(row_id)


# -------------------------------------------------------------------------

catalogo <- readxl::read_excel("catalogo_v1.xlsx")
catalogo |>
count(group)

by_especie <- catalogo |>
group_nest(group)

by_especie
# funciones ---------------------------------------------------------------

change_split_word <- function(x){
split_word <- str_extract_all(x,
"[a-z]{1,}-\\s[a-z]{1,}")

mgsub::mgsub(
x,
unlist(split_word),
str_replace_all(split_word |> unlist(), "-\\s", "")
)
}


change_split_num_interval <- function(x){
num_interval <- stringr::str_extract_all(x,
"[0-9]{1,}-\\s[0-9]{1,}")
mgsub::mgsub(
x ,
unlist(num_interval),
stringr::str_replace_all(num_interval |> unlist(), "-\\s", "-")
)
}

get_tbl_catl_new <- function(x){

get_usos <- function(x){
if(grepl("Categ. UICN:", x) == FALSE){
usos <- gsub(".*Usos:", "\\1", x)

}
else if(grepl("Categ. UICN:", x) == TRUE){
usos <- gsub(".*Usos: (.+) Categ. UICN:.*", "\\1", x)
}
return(usos)
}

get_uicn_text <- function(x){
if(grepl("Categ. UICN:", x) == FALSE){
uicn <- NA_character_

}
else if(grepl("Categ. UICN:", x) == TRUE){
uicn <- gsub(".*Categ. UICN:", "\\1", x)
}
return(uicn)
}

#new_x <- change_split_num_interval(x) |>
# change_split_word()


return(tibble::tibble(
species = gsub("N\\.Ver\\:.*", "\\1", x),
nombre_comun = gsub(".*N\\.Ver\\: (.+) Hábito:.*", "\\1", x),
habito = gsub(".*Hábito: (.+) Col Ref:.*", "\\1", x),
collecciom = gsub(".*Col Ref: (.+) Región\\(es\\):.*", "\\1", x),
distribucion = gsub(".*Región\\(es\\): (.+) Usos:.*", "\\1", x),
usos = get_usos(x),
categ_uicn = get_uicn_text(x)
) |>
dplyr::mutate_all(~stringr::str_trim(.) |>
stringr::str_squish() ))

}


# -------------------------------------------------------------------------

new_catalog <- catalogo |>
group_by(group) |>
summarise(new_text = paste0(especies, collapse = " "),
.groups = "drop") |>
select(-group)
new_catalog_vect <- new_catalog |> pull(new_text)

clean_df <- map_dfr(new_catalog_vect,
~get_tbl_catl_new(.))

#' > clean_df |> names()
#' [1] "species" "nombre_comun" "habito" "collecciom" "distribucion" "usos"
#' [7] "categ_uicn"

clean_df1 <- clean_df |>
mutate(species = change_split_word(species)) |>
mutate(nombre_comun = change_split_word(nombre_comun)) |>
mutate(collecciom = change_split_word(collecciom)) |>
mutate(distribucion = change_split_word(distribucion)) |>
mutate(distribucion = change_split_num_interval(distribucion)) |>
mutate(usos = change_split_word(usos))


clean_df2 <- clean_df1 |>
mutate(tag_subsp = case_when(
str_detect(species, "subsp\\.") ~ "subsp.",
str_detect(species, "var\\.") ~ "var.",
TRUE ~ NA_character_
)) |>
mutate(genus_ephitethon = stringr::word(species, 1, 1),
species_ephitethon = stringr::word(species, 2, 2),
subspecies_ephitethon = case_when(
tag_subsp == "var." ~ stringr::str_extract(species, "(?<=var\\. )(\\w+)"),
tag_subsp == "subsp." ~ stringr::str_extract(species, "(?<=subsp\\. )(\\w+)"),
TRUE ~ NA_character_
)) |>
mutate(tipo = stringr.plus::str_extract_before(habito, "\\s[0-9]{1,}"),
altura = stringr.plus::str_extract_after(habito, "[a-z]{1,}\\s")) |>
mutate(colector = stringr.plus::str_extract_before(collecciom,
"\\s[0-9]{1,}"),
id_coleccion = stringr::str_extract(collecciom, "[0-9]{1,}"),
deposito = stringr::str_extract(collecciom, "\\([^()]+\\)")
) |>
mutate(regiones = stringr.plus::str_extract_before(distribucion,
"\\sAlt\\(m\\.\\):"),
elevacion = stringr.plus::str_extract_after(distribucion,
"\\sAlt\\(m\\.\\): ")) |>
mutate(usos_2 = gsub("\\[.*?\\]", "", usos),
endemismo = if_else(str_detect(usos, "Dist\\. Endémico\\."),
"Endémico",
NA_character_),
categ_uicn = tm::removePunctuation(categ_uicn))
clean_df2 |>
writexl::write_xlsx("catalogo_ver_2.xlsx")
Binary file added data_raw/peru_timber_data.rda
Binary file not shown.
Binary file added data_raw/tnrs_data_perutimber.rda
Binary file not shown.
Binary file removed especies_cat.xlsx
Binary file not shown.
Binary file removed new_especies_catalogo.xlsx
Binary file not shown.
Binary file removed species_text_list.rda
Binary file not shown.
Binary file removed ~$especies_cat.xlsx
Binary file not shown.

0 comments on commit b4e0d06

Please sign in to comment.