An R-package for analyzing natural language with transformers-based
large language models. The talk package is part of the R Language
Analysis Suite, including talk, text and topics.
talktransforms voice recordings into text, audio features, or embeddings.
textprovide many language tasks such as converting digital text into word embeddings.
talkandtextprovide access to Large Language Models from Hugging Face.
topicsvisualizes language patterns into topics to generate psychological insights.
The R Language Analysis Suite is created through a collaboration
between psychology and computer science to address research needs and
ensure state-of-the-art techniques. The suite is continuously tested on
Ubuntu, Mac OS and Windows using the latest stable R version.
Most users simply need to run below installation code. For those experiencing problems, please see the Extended Installation Guide.
For the talk-package to work, you first have to install the talk-package in R, and then make it work with talk required python packages.
- Install talk-version (at the moment the second step only works using the development version of talk from GitHub).
GitHub development version:
# install.packages("devtools")
devtools::install_github("theharmonylab/talk")- Install and initialize talk required python packages:
library(talk)
# Install talk required python packages in a conda environment (with defaults).
talkrpp_install()
# Initialize the installed conda environment.
# save_profile = TRUE saves the settings so that you don't have to run talkrpp_initialize() after restarting R.
talkrpp_initialize(save_profile = TRUE)Recent significant advances in NLP research have resulted in improved representations of human language (i.e., language models). These language models have produced big performance gains in tasks related to understanding human language. talk are making these SOTA models easily accessible through an interface to HuggingFace in Python.
See HuggingFace for a more comprehensive list of models.
The talkText() function performs speech-to-text, transcribing audio
input to text. talkEmbed(), transforms audio input to numeric
representations (embeddings) that can be used for downstream tasks such
as guideline predictive models using the text-package (see the text
train functions).
library(talk)
# Transform the talk data to BERT word embeddings
# Get file path to example audio from the package example data
wav_path <- system.file("extdata/",
"test_short.wav",
package = "talk")
# Get transcription
talk_embeddings <- talkText(
wav_path
)
talk_embeddings
# Defaults
talk_embeddings <- talkEmbed(
wav_path
)
talk_embeddings