Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merging into devel #117

Merged
merged 33 commits into from
Aug 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
e5ee9bf
silence some tests
ncborcherding Mar 25, 2024
336f7ee
Merge remote-tracking branch 'upstream/devel'
ncborcherding Mar 25, 2024
9c631ec
update hex sticker location
ncborcherding Apr 4, 2024
d8b75d1
update r-cmd-check after PR
ncborcherding Apr 4, 2024
c067057
Update test-coverage.yaml
ncborcherding Apr 4, 2024
d5baeae
fix break caused by GSVA version
ncborcherding Apr 4, 2024
ed8df64
Update test-coverage.yaml
ncborcherding Apr 4, 2024
7fbb458
Update test-coverage.yaml
ncborcherding Apr 25, 2024
f825958
Update R-CMD-check.yaml
ncborcherding Apr 25, 2024
42b4c20
update testthat
ncborcherding May 1, 2024
30c551e
modulate testthat
ncborcherding May 1, 2024
4b75c16
3.19 release readme update
ncborcherding May 3, 2024
d4ecb0b
Update README.md
ncborcherding May 12, 2024
bc244da
Update README.md
ncborcherding May 12, 2024
1487b1c
Update README.md
ncborcherding May 12, 2024
8d4ec7a
Merge branch 'master' of https://github.com/ncborcherding/escape
ncborcherding Jul 26, 2024
774905b
Pass parallel to runEscape
ncborcherding Jul 26, 2024
c301476
News and Version update
ncborcherding Jul 26, 2024
7e4b065
Normalization fix
ncborcherding Jul 31, 2024
600bee8
pushing updated testthat
ncborcherding Aug 2, 2024
20a23d2
Merge branch 'master' into dev
ncborcherding Aug 2, 2024
44f083e
Merge pull request #109 from ncborcherding/dev
ncborcherding Aug 2, 2024
77ffb9c
Update performNormalization.R
LinearParadox Aug 13, 2024
685a12c
Update performNormalization.R
LinearParadox Aug 13, 2024
41d0798
Update performNormalization.R
ncborcherding Aug 14, 2024
26e935c
Update performNormalization.R
LinearParadox Aug 15, 2024
c4d3996
rewrote normalize to be vectorized.
LinearParadox Aug 17, 2024
1245b1d
added contributor name
LinearParadox Aug 17, 2024
44e7faf
updated to pass chunking, fixing bug with scale factors
LinearParadox Aug 19, 2024
46541f2
documentation. minor code cleanup
LinearParadox Aug 20, 2024
cb97c24
minor refactor
LinearParadox Aug 20, 2024
a3116ff
Update densityEnrichment.R
LinearParadox Aug 22, 2024
b5307d0
Merge pull request #111 from LinearParadox/LinearParadox-patch-1
ncborcherding Aug 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
^\.github$
^www$
^codecov\.yml$
^.*\.Rproj$
^\.Rproj\.user$
4 changes: 1 addition & 3 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,6 @@ jobs:
fail-fast: false
matrix:
config:
- {os: macos-latest, r: 'release'}
- {os: windows-latest, r: 'release'}
- {os: ubuntu-latest, r: 'release'}

env:
Expand All @@ -46,7 +44,7 @@ jobs:
needs: check

- name: Add GSVA repo
run: Rscript -e 'remotes::install_github("rcastelo/GSVA")'
run: Rscript -e 'remotes::install_github("rcastelo/GSVA@27d70c068f12f922e5ca2f363626089310dc2a2b")'

- uses: r-lib/actions/check-r-package@v2
with:
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/test-coverage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,14 @@ jobs:
run: Rscript -e 'install.packages("remotes")'

- name: Add GSVA repo
run: Rscript -e 'remotes::install_github("rcastelo/GSVA")'
run: Rscript -e 'remotes::install_github("rcastelo/GSVA@27d70c068f12f922e5ca2f363626089310dc2a2b")'

- name: Test coverage
run: |
covr::codecov(
quiet = FALSE,
clean = FALSE,
exclusions = "R/global.R",
install_path = file.path(normalizePath(Sys.getenv("RUNNER_TEMP"), winslash = "/"), "package")
)
shell: Rscript {0}
Expand Down
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,6 @@

.Rbuildignore
.DS_Store
.RHistory
escape.Rproj
.Rproj*
.RData
13 changes: 8 additions & 5 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
Package: escape
Title: Easy single cell analysis platform for enrichment
Version: 1.99.1
Date: 2024-02-29
Version: 2.0.1
Date: 2024-07-26
Authors@R: c(
person(given = "Nick", family = "Borcherding", role = c("aut", "cre"), email = "[email protected]"),
person(given = "Jared", family = "Andrews", role = c("aut"), email = "[email protected]"))
person(given = "Jared", family = "Andrews", role = c("aut"), email = "[email protected]"),
person(given = "Alexei", family = "Martsinkovskiy", role = c("ctb"), email = "[email protected]")
)
Description: A bridging R package to facilitate gene set enrichment analysis (GSEA) in the context of single-cell RNA sequencing. Using raw count information, Seurat objects, or SingleCellExperiment format, users can perform and visualize ssGSEA, GSVA, AUCell, and UCell-based enrichment calculations across individual cells.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: false
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
biocViews: Software, SingleCell, Classification, Annotation, GeneSetEnrichment, Sequencing, GeneSignaling, Pathways
Depends: R (>= 4.1)
Imports:
Expand All @@ -34,7 +36,8 @@ Imports:
UCell,
stringr,
methods,
SeuratObject
SeuratObject,
Matrix
Suggests:
Seurat,
hexbin,
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ importFrom(GSVA,gsvaParam)
importFrom(GSVA,ssgseaParam)
importFrom(MatrixGenerics,rowSds)
importFrom(MatrixGenerics,rowSums2)
importFrom(SeuratObject,Assays)
importFrom(SeuratObject,CreateAssayObject)
importFrom(SeuratObject,CreateDimReducObject)
importFrom(SeuratObject,Idents)
Expand Down
7 changes: 7 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
# escape VERSION 2.0.1 (2024-07-26)

## UNDERLYING CHANGES

* fixed ```performNormalziation()``` errors when input.data was a matrix, now requires single-cell object and enrichment data
* passing parallel processing properly to ```runEscape()``` function.

# escape VERSION 1.99.1 (2024-02-29)

## UNDERLYING CHANGES
Expand Down
128 changes: 72 additions & 56 deletions R/performNormalization.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,23 @@
#' evaluation for log2-fold change, but will alter the original
#' enrichment values.
#'
#' @param input.data Enrichment output from \code{\link{escape.matrix}} or
#' \code{\link{runEscape}}.
#' @param assay Name of the assay to plot if data is a single-cell object.
#' @param sc.data Single-cell object or matrix used in the gene set enrichment calculation in
#' \code{\link{escape.matrix}} or \code{\link{runEscape}}.
#' @param enrichment.data The enrichment results from \code{\link{escape.matrix}}
#' or \code{\link{runEscape}} (optional)
#' @param assay Name of the assay to normalize if using a single-cell object
#' @param gene.sets The gene set library to use to extract
#' the individual gene set information from.
#' the individual gene set information from
#' @param scale.factor A vector to use for normalizing enrichment scores per cell.
#' @param make.positive Shift enrichment values to a positive range \strong{TRUE}
#' for downstream analysis or not \strong{TRUE} (default).
#'
#' @param groups the number of cells to calculate normalization on at once.
#' chunks matrix into groups sized chunks. Useful in case of memory issues.
#' @importFrom stringr str_replace_all
#' @importFrom SeuratObject Assays
#' @importFrom SummarizedExperiment assays
#' @importFrom Matrix colSums

#' @examples
#' GS <- list(Bcells = c("MS4A1", "CD79B", "CD79A", "IGH1", "IGH2"),
#' Tcells = c("CD3E", "CD3D", "CD3G", "CD7","CD8A"))
Expand All @@ -29,23 +37,37 @@
#' gene.sets = GS)
#'
#' @export
#'
#' @return Single-cell object or matrix of normalized enrichment scores

performNormalization <- function(input.data,
assay = NULL,
gene.sets = NULL,
make.positive = FALSE,
scale.factor = NULL) {




performNormalization <- function(sc.data,
enrichment.data = NULL,
assay = "escape",
gene.sets = NULL,
make.positive = FALSE,
scale.factor = NULL,
groups = NULL) {
if(!is.null(assay)) {
if(is_seurat_object(sc.data)) {
assay.present <- assay %in% Assays(sc.data)
} else if (is_se_object(sc.data)) {
assay.present <- assay %in% assays(sc.data)
}
} else {
assay.present <- FALSE
}


if(is_seurat_or_se_object(input.data)) {
enriched <- .pull.Enrich(input.data, assay)
if(is_seurat_or_se_object(sc.data) & !is.null(assay) & assay.present) {
enriched <- .pull.Enrich(sc.data, assay)
} else {
enriched <- input.data
enriched <- enrichment.data
}

if(!is.null(scale.factor) & length(scale.factor) != dim(input.data)[2]) {
if(!is.null(scale.factor) & length(scale.factor) != dim(sc.data)[2]) {
stop("If using a vector as a scale factor, please ensure the length matches the number of cells.")
}

Expand All @@ -55,54 +77,48 @@ performNormalization <- function(input.data,
egc <- egc[names(egc) %in% colnames(enriched)]

#Isolating the number of genes per cell expressed
cnts <- .cntEval(input.data, assay = "RNA", type = "counts")
if(is.null(groups)){
chunks <- dim(enriched)[[1]]
}
else{
chunks <- min(groups, dim(enriched)[[1]])
}

if(is.null(scale.factor)) {
if (is.null(scale.factor)) {
cnts <- .cntEval(sc.data, assay = "RNA", type = "counts")
print("Calculating features per cell...")

# Pre-compute which genes are non-zero in each sample
non_zero_indices <- lapply(seq_len(ncol(cnts)), function(y) {
which(cnts[, y] != 0)
})

# Convert gene sets to a list of indices
egc_indices <- lapply(egc, function(x) {
which(rownames(cnts) %in% x)
})

egc.size <- lapply(egc_indices, function(gene_set_indices) {
sapply(non_zero_indices, function(sample_indices) {
length(intersect(sample_indices, gene_set_indices))
})
egc.sizes <- lapply(egc, function(x){
scales<-unname(Matrix::colSums(cnts[which(rownames(cnts) %in% x),]!=0))
scales[scales==0] <- 1
scales
})
egc.sizes <- split_rows(do.call(cbind,egc.sizes), chunk.size=chunks)
rm(cnts)
}
else{
egc.sizes <- split_vector(scale.factor, chunk.size=chunks)
}
enriched <- split_rows(enriched, chunk.size=chunks)

print("Normalizing enrichment scores per cell...")
#Dividing the enrichment score by number of genes expressed
lapply(seq_len(ncol(enriched)), function(x) {
if (!is.null(scale.factor)) {
enriched[,x] <- enriched[,x]/scale.factor
} else {
gene.set <- unlist(egc.size[colnames(enriched)[x]])
if(any(gene.set == 0)) {
gene.set[which(gene.set == 0)] <- 1
}
enriched[,x] <- enriched[,x]/gene.set
}
if(any(enriched[,x] < 0) & make.positive) {
enriched[,x] <- enriched[,x] + abs(min(enriched[,x]))

enriched<-mapply(function(scores, scales){
scores/scales
}, enriched, egc.sizes, SIMPLIFY = FALSE)
enriched <- do.call(rbind, enriched)
if(make.positive){
enriched <- apply(enriched, 2, function(x){
x+max(0, -min(x))
})
}
if(is_seurat_or_se_object(sc.data)) {
if(is.null(assay)) {
assay <- "escape"
}
enriched[,x]
}) -> normalized.values

normalized.enriched <- do.call(cbind, normalized.values)
colnames(normalized.enriched) <- colnames(enriched)

if(is_seurat_or_se_object(input.data)) {
input.data <- .adding.Enrich(input.data, normalized.enriched, paste0(assay, "_normalized"))
return(input.data)
sc.data <- .adding.Enrich(sc.data, enriched, paste0(assay, "_normalized"))
return(sc.data)
} else {
return(normalized.enriched)
return(enriched)
}

}
}
17 changes: 6 additions & 11 deletions R/runEscape.R
Original file line number Diff line number Diff line change
Expand Up @@ -100,14 +100,6 @@ escape.matrix <- function(input.data,
aucMaxRank = ceiling(0.2 * nrow(split.data[[i]])),
verbose = FALSE,
...))

# a <- t(assay(suppressWarnings(
# AUCell_run(exprMat = split.data[[i]],
# geneSets = egc,
# normAUC = TRUE,
# BPPARAM = BPPARAM,
# aucMaxRank = ceiling(0.2 * nrow(split.data[[i]])),
# ...))))

}
scores[[i]] <- a
Expand All @@ -117,10 +109,12 @@ escape.matrix <- function(input.data,

#Normalize based on dropout
if(normalize) {
output <- performNormalization(output,
output <- performNormalization(sc.data = input.data,
enrichment.data = output,
assay = NULL,
gene.sets = gene.sets,
make.positive = make.positive)
make.positive = make.positive,
groups = groups)
}
return(output)
}
Expand Down Expand Up @@ -175,7 +169,8 @@ runEscape <- function(input.data,
gene.sets = gene.sets,
method = method,
groups = groups,
min.size = min.size)
min.size = min.size,
BPPARAM = BPPARAM)

input.data <- .adding.Enrich(input.data, enrichment, new.assay.name)
return(input.data)
Expand Down
49 changes: 48 additions & 1 deletion R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ is_seurat_or_se_object <- function(obj) {
new.assay <- suppressWarnings(CreateAssayObject(
data = as.matrix(t(enrichment))))

sc[[enrichment.name]] <- new.assay
suppressWarnings(sc[[enrichment.name]] <- new.assay)
} else if (inherits(sc, "SingleCellExperiment")) {
altExp(sc, enrichment.name) <- SummarizedExperiment(assays = t(enrichment))
names(assays(altExp(sc, enrichment.name))) <- enrichment.name
Expand Down Expand Up @@ -254,4 +254,51 @@ is_seurat_or_se_object <- function(obj) {
return(values)
}

#function to split matrices by row
#adopted from ucells split_data.matrix
split_rows <- function (matrix, chunk.size = 1000)
{
nrows <- dim(matrix)[1]
if(is.vector(matrix)){
nrows <- length(matrix)
}
nchunks <- (nrows - 1)%/%chunk.size + 1
split.data <- list()
min <- 1
for (i in seq_len(nchunks)) {
if (i == nchunks - 1) {
left <- nrows - (i - 1) * chunk.size
max <- min + round(left/2) - 1
}
else {
max <- min(i * chunk.size, nrows)
}
split.data[[i]] <- matrix[min:max,]
min <- max + 1
}
return(split.data)
}
#function to split vector
#adopted from ucells split_data.matrix
split_vector <- function (vector, chunk.size = 1000)
{

nrows <- length(vector)
nchunks <- (nrows - 1)%/%chunk.size + 1
split.data <- list()
min <- 1
for (i in seq_len(nchunks)) {
if (i == nchunks - 1) {
left <- nrows - (i - 1) * chunk.size
max <- min + round(left/2) - 1
}
else {
max <- min(i * chunk.size, nrows)
}
split.data[[i]] <- vector[min:max]
min <- max + 1
}
return(split.data)
}


Loading
Loading