Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
5b67516
initiating the packge
Arshammik Apr 30, 2025
d08f7c4
initiating the package built
Arshammik May 1, 2025
7fbfa25
Adding the logo of the package and also delete the old description
Arshammik May 2, 2025
0ef521a
uploading the logo for package
Arshammik May 2, 2025
060c3b2
Update the description in the README
Arshammik May 2, 2025
ec2d941
Update the description in the README
Arshammik May 2, 2025
9491ec6
Update the description in the README
Arshammik May 2, 2025
f9edd0b
Update the description in the README
Arshammik May 2, 2025
8c4b73f
Adding the new logo
Arshammik May 2, 2025
bb364c6
Adding the new logo and relocate the figures
Arshammik May 2, 2025
0f6149b
Update the README
Arshammik May 2, 2025
dc67f69
Update README.md
Arshammik May 2, 2025
dab3276
Add user-facing R wrappers for silhouette, pseudo-correlation, and ro…
Arshammik May 2, 2025
39b852d
feat(core): add pseudo-correlation, silhouette, variance, and devianc…
Arshammik May 2, 2025
26dd30f
clean up
Arshammik May 2, 2025
32f0635
Refine package startup message in zzz.R
Arshammik May 2, 2025
e1b0eea
feat: add comprehensive STARsolo splicing and expression processing p…
Arshammik May 2, 2025
575f437
docs: add man pages for new splikit functions
Arshammik May 2, 2025
5a1665b
chore(src): add C++ sources and Makevars; ignore build artifacts
Arshammik May 2, 2025
01c03c4
chore: update package metadata and add RcppExports/globals
Arshammik May 2, 2025
491f94b
Adding the test
Arshammik May 2, 2025
438dca1
Create r.yml
Arshammik May 2, 2025
d1dd14c
Update r.yml
Arshammik May 2, 2025
fb8bc69
Update r.yml
Arshammik May 3, 2025
5e70157
Update r.yml
Arshammik May 3, 2025
6002ff1
Update r.yml
Arshammik May 3, 2025
d2a8199
Update r.yml
Arshammik May 3, 2025
b1f7cd5
Update r.yml
Arshammik May 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
^splikit\.Rproj$
^\.Rproj\.user$
^LICENSE\.md$
43 changes: 43 additions & 0 deletions .github/workflows/r.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: R-CMD-check

on:
push:
branches: [pkg]
pull_request:
branches: [pkg]

jobs:
R-CMD-check:
runs-on: ubuntu-latest

name: Check on ${{ matrix.config.os }} (R ${{ matrix.config.r }})

strategy:
matrix:
config:
- { os: ubuntu-latest, r: 'release' }

steps:
- uses: actions/checkout@v4

- name: Set up R
uses: r-lib/actions/setup-r@v2
with:
r-version: ${{ matrix.config.r }}

- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y libcurl4-openssl-dev libssl-dev libxml2-dev

- name: Install dependencies
uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: |
rcmdcheck
testthat
devtools
needs: true

- name: Check package
uses: r-lib/actions/check-r-package@v2
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,8 @@ po/*~
# RStudio Connect folder
rsconnect/
.ipynb_checkpoints
.Rproj.user

# build artifacts
*.o
*.so
30 changes: 30 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
Package: splikit
Title: A toolkit for analysing RNA splicing in scRNA-seq data
Version: 1.0.0
Authors@R:
person("Arsham", "Mikaeili Namini", , "arsham.mikaeilinamini@mail.mcgill.ca", role = c("aut", "cre"),
comment = c(ORCID = "0000-0002-9453-6951"))
Description:
Splikit /ˈsplaɪ.kɪt/ is a toolkit designed for the analysis of high-dimensional single-cell
splicing data. It provides a framework to extract and work with ratio-based data structures
derived from single-cell RNA sequencing experiments. The package avoids the need for bulky
S4 objects by offering direct and efficient manipulation of matrices. Core functionalities
are implemented in C++ via Rcpp to ensure high performance and scalability on large datasets.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
Imports:
Matrix,
data.table,
methods,
stats,
Rcpp,
RcppArmadillo
LinkingTo:
Rcpp,
RcppArmadillo,
RcppEigen
Suggests:
testthat (>= 3.0.0)
Config/testthat/edition: 3
23 changes: 2 additions & 21 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,21 +1,2 @@
MIT License

Copyright (c) 2025 Arsham Mikaeili Namini

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
YEAR: 2025
COPYRIGHT HOLDER: splikit authors
21 changes: 21 additions & 0 deletions LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# MIT License

Copyright (c) 2025 splikit authors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
31 changes: 31 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Generated by roxygen2: do not edit by hand

export(find_variable_events)
export(find_variable_genes)
export(get_pseudo_correlation)
export(get_rowVar)
export(get_silhouette_mean)
export(load_toy_SJ_object)
export(make_eventdata_plus)
export(make_gene_count)
export(make_junction_ab)
export(make_m1)
export(make_m2)
export(make_velo_count)
import(Matrix)
import(data.table)
importFrom(Matrix,Matrix)
importFrom(Matrix,readMM)
importFrom(Matrix,sparseMatrix)
importFrom(Rcpp,evalCpp)
importFrom(data.table,":=")
importFrom(data.table,.GRP)
importFrom(data.table,.N)
importFrom(data.table,as.data.table)
importFrom(data.table,copy)
importFrom(data.table,data.table)
importFrom(data.table,fread)
importFrom(data.table,is.data.table)
importFrom(data.table,setDT)
importFrom(data.table,setnames)
useDynLib(splikit, .registration = TRUE)
27 changes: 27 additions & 0 deletions R/RcppExports.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Generated by using Rcpp::compileAttributes() -> do not edit by hand
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393

silhouette_avg <- function(X, cluster_assignments, n_threads = 1L) {
.Call(`_splikit_silhouette_avg`, X, cluster_assignments, n_threads)
}

calcDeviances_ratio <- function(M1, M2) {
.Call(`_splikit_calcDeviances_ratio`, M1, M2)
}

cppBetabinPseudoR2 <- function(Z, m1, m2) {
.Call(`_splikit_cppBetabinPseudoR2`, Z, m1, m2)
}

calcNBDeviancesWithThetaEstimation <- function(gene_expression) {
.Call(`_splikit_calcNBDeviancesWithThetaEstimation`, gene_expression)
}

standardizeSparse_variance_vst <- function(matSEXP, display_progress = FALSE) {
.Call(`_splikit_standardizeSparse_variance_vst`, matSEXP, display_progress)
}

rowVariance_cpp <- function(mat) {
.Call(`_splikit_rowVariance_cpp`, mat)
}

185 changes: 185 additions & 0 deletions R/feature_selection.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
#' Calculate the Sum Deviance for Inclusion and Exclusion Matrices
#'
#' @param m1_matrix A matrix representing the inclusion matrix. Rows are events, columns are barcodes.
#' @param m2_matrix A matrix representing the exclusion matrix. Rows are events, columns are barcodes.
#' @param min_row_sum A numeric value specifying the minimum row sum threshold for filtering events. Defaults to 50.
#' @param verbose Logical. If \code{TRUE} (default), prints progress and informational messages.
#' @param ... Additional arguments to be passed.
#'
#' @return A \code{data.table} containing the events and their corresponding sum deviance values.
#' @export
find_variable_events <- function(m1_matrix, m2_matrix, min_row_sum = 50, verbose=TRUE, ...) {

# Load necessary libraries
if (!requireNamespace("data.table", quietly = TRUE)) {
stop("The 'data.table' package is required but not installed.")
}

if (!requireNamespace("Rcpp", quietly = TRUE)) {
stop("The 'Rcpp' package is required but not installed.")
}

if (!requireNamespace("Matrix", quietly = TRUE)) {
stop("The 'Matrix' package is required but not installed.")
}

# Check if matrices are sparse
if (!(inherits(m1_matrix, "Matrix") && inherits(m2_matrix, "Matrix"))) {
stop("Both 'm1_matrix' and 'm2_matrix' must be sparse matrices of class 'Matrix'.")
}

# Check matrix compatibility
if (!identical(colnames(m1_matrix), colnames(m2_matrix))) {
stop("The colnames (barcodes) of inclusion and exclusion matrices are not identical.")
}

if (!identical(rownames(m1_matrix), rownames(m2_matrix))) {
stop("The rownames (junction events) of inclusion and exclusion matrices are not identical.")
}

# Filter rows based on minimum row sum criteria
to_keep_events <- which(rowSums(m1_matrix) > min_row_sum & rowSums(m2_matrix) > min_row_sum)
m1_matrix <- m1_matrix[to_keep_events, , drop = FALSE]
m2_matrix <- m2_matrix[to_keep_events, , drop = FALSE]

# Create metadata table
temp_current_barcodes <- data.table::data.table(brc = colnames(m1_matrix))
temp_current_barcodes$ID <- sub("^.{16}-(.*$)", "\\1", temp_current_barcodes$brc)
meta <- temp_current_barcodes

libraries <- unique(meta$ID)
cat("There are", length(libraries), "libraries detected...\n")

# Initialize deviance sum vector
sum_deviances <- numeric(nrow(m1_matrix))
names(sum_deviances) <- rownames(m1_matrix)

for (lib in libraries) {
filter <- which(meta[, ID] == lib)
M1_sub <- m1_matrix[, filter, drop = FALSE]
M2_sub <- m2_matrix[, filter, drop = FALSE]

# Calculate deviances using the C++ function
deviance_values <- tryCatch({
calcDeviances_ratio(M1_sub, M2_sub)
}, error = function(e) {
stop("Error in calcDeviances_ratio function: ", e$message)
})

deviance_values <- c(deviance_values)
names(deviance_values) <- rownames(M1_sub)
sum_deviances <- sum_deviances + deviance_values
if(verbose){cat("Calculating the deviances for sample", lib, "has been completed!\n")}
}

rez <- data.table::data.table(events = names(sum_deviances), sum_deviance = as.numeric(sum_deviances))
return(rez)
cat("All Done!\n")
}

#' Find Variable Genes Using Variance or Deviance-Based Metrics
#'
#' @description
#' Identifies highly variable genes from a sparse gene expression matrix using one of two methods:
#' variance-stabilizing transformation (VST) or deviance-based modeling. The VST method uses a C++-accelerated
#' approach to compute standardized variance, while the deviance-based method models gene variability
#' across libraries using negative binomial deviances.
#'
#' @param gene_expression_matrix A sparse gene expression matrix (of class \code{Matrix}) with gene names as row names.
#' @param method Character string, either \code{"vst"} or \code{"sum_deviance"}. The default is \code{"sum_deviance"}.
#' \code{"vst"} uses a variance-stabilizing transformation to identify variable genes.
#' \code{"sum_deviance"} computes per-library deviances and combines them with a row variance metric.
#' @param ... Additional arguments (currently unused).
#'
#' @return A \code{data.table} containing gene names (column \code{events}) and computed metrics.
#' For the deviance method, this includes \code{sum_deviance} and \code{variance} columns.
#'
#' @export
find_variable_genes <- function(gene_expression_matrix, method = c("vst", "sum_deviance"), ...) {

# Check required libraries
if (!requireNamespace("data.table", quietly = TRUE)) {
stop("The 'data.table' package is required but not installed.")
}
if (!requireNamespace("Rcpp", quietly = TRUE)) {
stop("The 'Rcpp' package is required but not installed.")
}
if (!requireNamespace("Matrix", quietly = TRUE)) {
stop("The 'Matrix' package is required but not installed.")
}

# Verify that gene_expression_matrix is a sparse Matrix
if (!inherits(gene_expression_matrix, "Matrix")) {
stop("The 'gene_expression_matrix' must be a sparse matrix of class 'Matrix'.")
}

if (method == "vst") {
cat("The method we are using is vst (Seurat)...\n")
if (!exists("standardizeSparse_variance_vst")) {
stop("The function 'standardizeSparse_variance_vst' is not available. Check your C++ source files.")
}
rez_vector <- tryCatch({
standardizeSparse_variance_vst(matSEXP = gene_expression_matrix)
}, error = function(e) {
stop("Error in standardizeSparse_variance_vst: ", e$message)
})
rez <- data.table::data.table(events = rownames(gene_expression_matrix),
standardize_variance = rez_vector)
} else {
cat("The method we are using is like deviance summarion per library...\n")

# Filter rows based on minimum row sum criteria
to_keep_features <- which(rowSums(gene_expression_matrix) > 0)
if (length(to_keep_features) == 0) {
stop("No genes with a positive row sum were found.")
}
gene_expression_matrix <- gene_expression_matrix[to_keep_features, , drop = FALSE]

# Create metadata table using column names
temp_current_barcodes <- data.table::data.table(brc = colnames(gene_expression_matrix))
temp_current_barcodes$ID <- sub("^.{16}-(.*$)", "\\1", temp_current_barcodes$brc)
meta <- temp_current_barcodes

libraries <- unique(meta$ID)
cat("There are", length(libraries), "libraries detected...\n")

# Initialize deviance sum vector with gene names
sum_deviances <- numeric(nrow(gene_expression_matrix))
names(sum_deviances) <- rownames(gene_expression_matrix)

# Loop over each library to compute deviances
for (lib in libraries) {
filter <- which(meta[, ID] == lib)
gene_expression_matrix_sub <- gene_expression_matrix[, filter, drop = FALSE]

# Calculate deviances using the C++ function
deviance_values <- tryCatch({
calcNBDeviancesWithThetaEstimation(gene_expression_matrix_sub)
}, error = function(e) {
stop("Error in calcNBDeviancesWithThetaEstimation function: ", e$message)
})

deviance_values <- c(deviance_values)
names(deviance_values) <- rownames(gene_expression_matrix_sub)
sum_deviances <- sum_deviances + deviance_values
cat("Calculating the deviances for sample", lib, "has been completed!\n")
}

# Compute row variance using the previously defined function
row_var <- tryCatch({
multigedi_get_row_variance(sparse_matrix = gene_expression_matrix)
}, error = function(e) {
stop("Error in multigedi_get_row_variance: ", e$message)
})

row_var_cpp_dt <- data.table::data.table(events = rownames(gene_expression_matrix),
variance = row_var)

rez <- data.table::data.table(events = names(sum_deviances),
sum_deviance = as.numeric(sum_deviances))
rez <- base::merge(rez, row_var_cpp_dt, by = "events")
data.table::setkey(x = rez, NULL)
}

return(rez)
}
Loading