Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
219 changes: 219 additions & 0 deletions PERFORMANCE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
# Performance Recommendations

This document provides guidance on optimizing performance when using the hanwoo package.

## API Call Optimization

### Batch Processing with `hanwoo_info()`

When retrieving data for multiple cattle, avoid sequential calls in a simple loop. Instead, use parallel processing or batch your requests efficiently:

**Not Recommended:**
```r
# Sequential processing - slow for large datasets
results <- list()
for (i in seq_along(cattle_ids)) {
results[[i]] <- hanwoo_info(cattle_ids[i], key_encoding, key_decoding)
}
```

**Recommended:**
```r
# Use purrr::map with error handling for cleaner code
library(purrr)

# Option 1: Sequential with progress bar (requires pbapply package)
# install.packages("pbapply")
library(pbapply)
results <- pblapply(cattle_ids, function(x) {
tryCatch(
hanwoo_info(x, key_encoding, key_decoding),
error = function(e) NULL
)
})

# Option 2: Parallel processing with furrr (use with caution to avoid API rate limits)
# install.packages("furrr")
library(furrr)
plan(multisession, workers = 4) # Adjust workers based on API limits
results <- future_map(cattle_ids, function(x) {
tryCatch(
hanwoo_info(x, key_encoding, key_decoding),
error = function(e) NULL
)
}, .progress = TRUE)
```

### Rate Limiting

When making many API calls, implement rate limiting to avoid overloading the server:

```r
# Add delays between requests
results <- map(cattle_ids, function(x) {
result <- tryCatch(
hanwoo_info(x, key_encoding, key_decoding),
error = function(e) NULL
)
Sys.sleep(0.5) # 500ms delay between requests
return(result)
})
```

### Caching Results

For frequently accessed data, cache results to avoid repeated API calls:

```r
# Simple file-based caching
cache_dir <- "cache"
if (!dir.exists(cache_dir)) dir.create(cache_dir)

get_cached_info <- function(cattle_id, key_encoding, key_decoding) {
cache_file <- file.path(cache_dir, paste0(cattle_id, ".rds"))

if (file.exists(cache_file)) {
# Check if cache is less than 24 hours old
if (difftime(Sys.time(), file.mtime(cache_file), units = "hours") < 24) {
return(readRDS(cache_file))
}
}

# Fetch fresh data
result <- hanwoo_info(cattle_id, key_encoding, key_decoding)
saveRDS(result, cache_file)
return(result)
}
```

## Data Processing Optimization

### Efficient Data Extraction

When you only need specific fields from the results, extract them early:

```r
# Extract only quality_info from multiple cattle
quality_data <- map_df(cattle_ids, function(x) {
result <- tryCatch(
hanwoo_info(x, key_encoding, key_decoding)$quality_info,
error = function(e) NULL
)
return(result)
})
```

### Vectorized Operations

Use vectorized operations with `req_steer()` and `req_bull()`:

```r
# Instead of loops, use map functions
library(purrr)
library(dplyr)

# Vectorized nutrient requirements calculation
df <- data.frame(
month = 6:15,
weight = c(160, 184, 208, 233, 258, 284, 310, 337, 366, 395),
daily_gain = c(0.8, 0.8, 0.8, 0.9, 0.9, 0.9, 0.9, 1, 1, 1)
)

# Efficient calculation
requirements <- map2_df(df$weight, df$daily_gain, req_steer)
```

## Memory Management

### Process Data in Chunks

For large datasets, process data in chunks to manage memory:

```r
# Process cattle IDs in batches
chunk_size <- 100
cattle_chunks <- split(cattle_ids, ceiling(seq_along(cattle_ids) / chunk_size))

all_results <- list()
for (i in seq_along(cattle_chunks)) {
chunk_results <- map(cattle_chunks[[i]], function(x) {
tryCatch(
hanwoo_info(x, key_encoding, key_decoding),
error = function(e) NULL
)
})

# Extract and save only needed data, then clear memory
all_results[[i]] <- map_df(chunk_results, ~ .x$quality_info)
rm(chunk_results)
gc() # Garbage collection

Sys.sleep(5) # Pause between chunks
}

final_result <- bind_rows(all_results)
```

## Performance Monitoring

### Use Time Checks

The `hanwoo_info()` function includes a built-in timing feature:

```r
# Enable time checking
result <- hanwoo_info(
cattle = "002083191603",
key_encoding = key_encoding,
key_decoding = key_decoding,
time_check = TRUE
)
# Outputs: "서버 응답 시간: X.XX secs"
```

### Benchmark Your Code

Use `microbenchmark` to compare different approaches:

```r
library(microbenchmark)

# Compare different methods
microbenchmark(
sequential = map(sample_ids, ~ hanwoo_info(.x, key_encoding, key_decoding)),
with_cache = map(sample_ids, ~ get_cached_info(.x, key_encoding, key_decoding)),
times = 5
)
```

## API Best Practices

1. **Respect Rate Limits**: Add appropriate delays between API calls
2. **Handle Errors Gracefully**: Always use `tryCatch()` for API calls
3. **Cache When Possible**: Store results locally for frequently accessed data
4. **Filter Early**: Request only the data you need
5. **Process in Parallel**: Use parallel processing for independent operations (with caution)
6. **Monitor Response Times**: Use the `time_check` parameter to identify slow operations
7. **Clean Up Resources**: Use `gc()` to free memory when processing large datasets

## Code Optimizations Already Implemented

The following optimizations have been implemented in version 0.2.1+:

1. **Reduced String Concatenation**: Base URL patterns are cached in `hanwoo_info()`
2. **Consolidated Type Conversions**: Type conversions are performed once instead of multiple times
3. **Early Returns**: Functions return early when no data is available, avoiding unnecessary processing
4. **Efficient Data Binding**: Uses `bind_rows()` instead of deprecated `plyr` functions

## Troubleshooting Performance Issues

If you experience slow performance:

1. Check your internet connection and API server response times
2. Verify API key validity and permissions
3. Reduce the number of concurrent requests
4. Use caching for repeated queries
5. Process data in smaller batches
6. Monitor memory usage with `pryr::mem_used()`

For additional help, please submit an issue at: https://github.com/adatalab/hanwoo/issues
2 changes: 1 addition & 1 deletion R/hanwoo_bull.R
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,6 @@ hanwoo_bull <- function(KPN, type = "list") {
}

if(type == "selected") {
return(df$EVB_selected)
return(df$EPD)
}
}
65 changes: 22 additions & 43 deletions R/hanwoo_info.R
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,12 @@ hanwoo_info <- function(cattle, key_encoding, key_decoding, time_check = FALSE)
}
)
}

# Base URL pattern for traceNoSearch API to reduce string concatenation
base_url <- paste0("http://data.ekape.or.kr/openapi-data/service/user/animalTrace/traceNoSearch?ServiceKey=", key_encoding, "&traceNo=", cattle, "&optionNo=")

# 기본 정보 파싱 및 오류 확인
basic_info <- safe_xml_parse(paste0("http://data.ekape.or.kr/openapi-data/service/user/animalTrace/traceNoSearch?ServiceKey=", key_encoding, "&traceNo=", cattle, "&optionNo=", 1))
basic_info <- safe_xml_parse(paste0(base_url, 1))
if (is.null(basic_info) || xmlToDataFrame(basic_info)$resultCode[1] == 99) {
return(xmlToDataFrame(basic_info)$resultMsg[1])
}
Expand All @@ -62,15 +65,15 @@ hanwoo_info <- function(cattle, key_encoding, key_decoding, time_check = FALSE)
mutate(birthYmd = ymd(birthYmd))

# 농장 정보 추출 및 처리
farm_info <- safe_xml_parse(paste0("http://data.ekape.or.kr/openapi-data/service/user/animalTrace/traceNoSearch?ServiceKey=", key_encoding, "&traceNo=", cattle, "&optionNo=", 2)) %>%
farm_info <- safe_xml_parse(paste0(base_url, 2)) %>%
getNodeSet("//item") %>%
xmlToDataFrame(stringsAsFactors = FALSE) %>%
as_tibble() %>%
mutate(cattleNo = cattle, regYmd = ymd(regYmd)) %>%
select(cattleNo, everything())

# 도축 정보 추출 및 처리
butchery_info <- safe_xml_parse(paste0("http://data.ekape.or.kr/openapi-data/service/user/animalTrace/traceNoSearch?ServiceKey=", key_encoding, "&traceNo=", cattle, "&optionNo=", 3)) %>%
butchery_info <- safe_xml_parse(paste0(base_url, 3)) %>%
getNodeSet("//item") %>%
xmlToDataFrame(stringsAsFactors = FALSE) %>%
as_tibble()
Expand All @@ -79,26 +82,26 @@ hanwoo_info <- function(cattle, key_encoding, key_decoding, time_check = FALSE)
}

# 가공 정보 추출 및 처리
process_info <- safe_xml_parse(paste0("http://data.ekape.or.kr/openapi-data/service/user/animalTrace/traceNoSearch?ServiceKey=", key_encoding, "&traceNo=", cattle, "&optionNo=", 4)) %>%
process_info <- safe_xml_parse(paste0(base_url, 4)) %>%
getNodeSet("//item") %>%
xmlToDataFrame(stringsAsFactors = FALSE) %>%
as_tibble()

# 백신 정보 추출 및 처리
vaccine_info <- safe_xml_parse(paste0("http://data.ekape.or.kr/openapi-data/service/user/animalTrace/traceNoSearch?ServiceKey=", key_encoding, "&traceNo=", cattle, "&optionNo=", 5)) %>%
vaccine_info <- safe_xml_parse(paste0(base_url, 5)) %>%
getNodeSet("//item") %>%
xmlToDataFrame(stringsAsFactors = FALSE) %>%
as_tibble() %>%
mutate(injectionYmd = ymd(injectionYmd))

# 검사 정보 추출 및 처리
inspect_info <- safe_xml_parse(paste0("http://data.ekape.or.kr/openapi-data/service/user/animalTrace/traceNoSearch?ServiceKey=", key_encoding, "&traceNo=", cattle, "&optionNo=", 6)) %>%
inspect_info <- safe_xml_parse(paste0(base_url, 6)) %>%
getNodeSet("//item") %>%
xmlToDataFrame(stringsAsFactors = FALSE) %>%
as_tibble()

# 브루셀라 정보 추출 및 처리
brucella_info <- safe_xml_parse(paste0("http://data.ekape.or.kr/openapi-data/service/user/animalTrace/traceNoSearch?ServiceKey=", key_encoding, "&traceNo=", cattle, "&optionNo=", 7)) %>%
brucella_info <- safe_xml_parse(paste0(base_url, 7)) %>%
getNodeSet("//item") %>%
xmlToDataFrame(stringsAsFactors = FALSE) %>%
as_tibble()
Expand All @@ -107,13 +110,13 @@ hanwoo_info <- function(cattle, key_encoding, key_decoding, time_check = FALSE)
}

# 로트 정보 추출 및 처리
lot_info <- safe_xml_parse(paste0("http://data.ekape.or.kr/openapi-data/service/user/animalTrace/traceNoSearch?ServiceKey=", key_encoding, "&traceNo=", cattle, "&optionNo=", 8)) %>%
lot_info <- safe_xml_parse(paste0(base_url, 8)) %>%
getNodeSet("//item") %>%
xmlToDataFrame(stringsAsFactors = FALSE) %>%
as_tibble()

# 판매자 정보 추출 및 처리
seller_info <- safe_xml_parse(paste0("http://data.ekape.or.kr/openapi-data/service/user/animalTrace/traceNoSearch?ServiceKey=", key_encoding, "&traceNo=", cattle, "&optionNo=", 9)) %>%
seller_info <- safe_xml_parse(paste0(base_url, 9)) %>%
getNodeSet("//item") %>%
xmlToDataFrame(stringsAsFactors = FALSE) %>%
as_tibble()
Expand Down Expand Up @@ -159,13 +162,15 @@ hanwoo_info <- function(cattle, key_encoding, key_decoding, time_check = FALSE)
quality_info[[col]] <- NA
}

quality_info <- quality_info %>% add_row(quality_info_add) %>%
# Add new row and perform type conversions once
quality_info <- quality_info %>%
add_row(quality_info_add) %>%
mutate(
qgrade = factor(qgrade, levels = c("D", "3", "2", "1", "1+", "1++")),
issueDate = ymd(issueDate),
abattDate = lubridate::ymd(abattDate),
birthmonth = as.numeric(birthmonth),
costAmt = as.integer(costAmt),
costAmt = if("costAmt" %in% names(.)) as.integer(costAmt) else NA_integer_,
weight = as.integer(weight),
rea = as.integer(rea),
backfat = as.integer(backfat),
Expand All @@ -175,43 +180,17 @@ hanwoo_info <- function(cattle, key_encoding, key_decoding, time_check = FALSE)
yuksak = as.integer(yuksak),
fatsak = as.integer(fatsak),
growth = as.integer(growth)
)
) %>%
filter(!is.na(cattleNo))

# Reorder columns based on whether costAmt exists
if ("costAmt" %in% names(quality_info)) {
quality_info <- quality_info %>%
select(cattleNo, abattDate, judgeSexNm, birthmonth, qgrade, wgrade, costAmt, weight, rea, backfat, insfat, windex, tissue, yuksak, fatsak, growth, everything()) %>%
mutate(
abattDate = lubridate::ymd(abattDate),
birthmonth = as.numeric(birthmonth),
costAmt = as.integer(costAmt),
weight = as.integer(weight),
rea = as.integer(rea),
backfat = as.integer(backfat),
insfat = as.integer(insfat),
windex = as.numeric(windex),
tissue = as.integer(tissue),
yuksak = as.integer(yuksak),
fatsak = as.integer(fatsak),
growth = as.integer(growth)
)
select(cattleNo, abattDate, judgeSexNm, birthmonth, qgrade, wgrade, costAmt, weight, rea, backfat, insfat, windex, tissue, yuksak, fatsak, growth, everything())
} else {
quality_info <- quality_info %>%
select(cattleNo, abattDate, judgeSexNm, birthmonth, qgrade, wgrade, weight, rea, backfat, insfat, windex, tissue, yuksak, fatsak, growth, everything()) %>%
mutate(
abattDate = lubridate::ymd(abattDate),
birthmonth = as.numeric(birthmonth),
costAmt = NA,
weight = as.integer(weight),
rea = as.integer(rea),
backfat = as.integer(backfat),
insfat = as.integer(insfat),
windex = as.numeric(windex),
tissue = as.integer(tissue),
yuksak = as.integer(yuksak),
fatsak = as.integer(fatsak),
growth = as.integer(growth)
)
select(cattleNo, abattDate, judgeSexNm, birthmonth, qgrade, wgrade, weight, rea, backfat, insfat, windex, tissue, yuksak, fatsak, growth, everything())
}
quality_info <- quality_info %>% filter(!is.na(cattleNo))
}
}

Expand Down
Loading