Skip to content
This repository has been archived by the owner on Feb 11, 2024. It is now read-only.

Commit

Permalink
Update README ref #27 (#32)
Browse files Browse the repository at this point in the history
  • Loading branch information
chainsawriot authored Nov 17, 2023
1 parent af071cd commit dbd414c
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 4 deletions.
2 changes: 1 addition & 1 deletion R/get_dist.R
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ resolve_keywords <- function(keywords, features, valuetype) {
#' @param get_min logical, whether to return only the minimum distance or raw distance information; it is more relevant when `keywords` have more than one word. See details.
#' @param valuetype See [quanteda::valuetype]
#' @param count_from numeric, how proximity is counted from when `get_min` is `TRUE`. The keyword is assigned with this proximity. Default to 1 (not zero) to prevent division by 0 with the default behaviour of [dfm.tokens_with_proximity()].
#' @details Proximity is measured by the number of tokens away from the keyword. Given a tokenized sentence: \["I", "eat", "this", "apple"\] and suppose "eat" is the target. The vector of minimum proximity for each word from "eat" is \[2, 1, 2, 3\], if `count_from` is 1. In another case: \["I", "wash", "and", "eat", "this", "apple"\] and \["wash", "eat"\] are the keywords. The minimal distance vector is \[2, 1, 2, 1, 2, 3\]. If `get_min` is `FALSE`, the output is a list of two vectors. For "wash", the distance vector is \[1, 0, 1, 2, 3\]. For "eat", \[3, 2, 1, 0, 1, 2\].
#' @details Proximity is measured by the number of tokens away from the keyword. Given a tokenized sentence: \["I", "eat", "this", "apple"\] and suppose "eat" is the keyword. The vector of minimum proximity for each word from "eat" is \[2, 1, 2, 3\], if `count_from` is 1. In another case: \["I", "wash", "and", "eat", "this", "apple"\] and \["wash", "eat"\] are the keywords. The minimal distance vector is \[2, 1, 2, 1, 2, 3\]. If `get_min` is `FALSE`, the output is a list of two vectors. For "wash", the distance vector is \[1, 0, 1, 2, 3\]. For "eat", \[3, 2, 1, 0, 1, 2\].
#' It is recommended conducting all text maniputation tasks with `tokens_*()` functions before calling this function.
#' @return a `tokens_with_proximity` object. It is a derivative of [quanteda::tokens()], i.e. all `token_*` functions still work. A `tokens_with_proximity` has a modified [print()] method. Also, additional data slots are included
#' * a document variation `dist`
Expand Down
4 changes: 3 additions & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@ knitr::opts_chunk$set(
[![R-CMD-check](https://github.com/gesistsa/quanteda.proximity/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/gesistsa/quanteda.proximity/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

The goal of quanteda.proximity is to add a hacky layer of proximityp vectors into the `tokens` object of `quanteda`.
The goal of quanteda.proximity is to add proximity vectors into the `tokens` object of `quanteda`.

Proximity is measured by the number of tokens away from the keyword. Given a tokenized sentence: ["I", "wash", "this", "apple"] and suppose "eat" is the keyword. The proximity vector is a vector with the same length as the tokenized sentence and the values (using the default settings) are [2, 1, 2, 3].

## Installation

Expand Down
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,14 @@
[![R-CMD-check](https://github.com/gesistsa/quanteda.proximity/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/gesistsa/quanteda.proximity/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

The goal of quanteda.proximity is to add a hacky layer of proximityp
vectors into the `tokens` object of `quanteda`.
The goal of quanteda.proximity is to add proximity vectors into the
`tokens` object of `quanteda`.

Proximity is measured by the number of tokens away from the keyword.
Given a tokenized sentence: \[“I”, “wash”, “this”, “apple”\] and suppose
“eat” is the keyword. The proximity vector is a vector with the same
length as the tokenized sentence and the values (using the default
settings) are \[2, 1, 2, 3\].

## Installation

Expand Down

0 comments on commit dbd414c

Please sign in to comment.