Micah's RNA+ADT+HTO update using UM5 and Michael porjects as templates #14

micahpf · 2024-08-23T11:49:53Z

Note that I wasn't able to finish all the planned changes before I left Emory.

I left off partway through the integration and annotation runfile. You can use the UM5 subsetting runfiles/formatfiles and michael integration+annotation runfiles/formatfiles as references to finish that.

The next step would be to add the pseudobulk script from the Michael project. Fingers crossed this slots in seamlessly with little changes required.

Also note that there is a bug in the processing runfile at seurat object creation due to package conflicts. I think this could be resolved by updating R to the latest version with a bioconductor base docker image.

Good luck!

… and it is just parrotting cellranger outputs) and updated formatting.

…easier to switch between RNA vs RNA+HTO+ADT example datasets.

…ulti or count output structure.

…RNA` and `template_ADT` instead of the overly verbose `template_RNA+HTO+ADT`

…eurat 5.

… workflow to allow us to plot the UMAPs before and after removing doublets in the doublet section of the formatted report.

…ile.

…format file with `20-APC-subset+recluster` scripts from p21242_Satish_UM5_Analysis

…yers() workflow used in Michael project.

… Also add sample metadata to seurat object.

akcorut · 2024-10-29T19:28:32Z

analysis_scripts/01-processing-template_ADT+soupX.runfile.Rmd

+s.split.unfilt <- lapply(names(s.split.unfilt), function(capID) { #s.split.unfilt 
+  s.split.unfilt[[capID]]@assays$RNA@layers$counts <- 
+    sc[[capID]]$adjusted_counts[,colnames(s.split.unfilt[[capID]])]
+  return(s.split.unfilt[[capID]])
+})


This fails with Error in .subscript.2ary(x, , j, drop = TRUE) : subscript out of bounds

scRNAseq_template/analysis_scripts/01-processing-template_ADT+soupX.runfile.Rmd

Lines 513 to 517 in 8484950

# Add capID to front of cell barcode to prevent collisions when merging across clusters

sc <- lapply(sc, function(x) {

x$adjusted_counts@Dimnames[[2]] <- paste0(capID, "_", x$adjusted_counts@Dimnames[[2]])

return(x)

})

This error seems to be caused by a bug in the above part where all capIDs are replaced by the last capID.

Below is a fix for this issue.

Suggested change

s.split.unfilt <- lapply(names(s.split.unfilt), function(capID) { #s.split.unfilt

s.split.unfilt[[capID]]@assays$RNA@layers$counts <-

sc[[capID]]$adjusted_counts[,colnames(s.split.unfilt[[capID]])]

return(s.split.unfilt[[capID]])

})

# Add capID to front of cell barcode to prevent collisions when merging across clusters

for (capID in names(sc)) {

sc[[capID]]$adjusted_counts@Dimnames[[2]] <- paste0(capID, "_", sc[[capID]]$adjusted_counts@Dimnames[[2]])

}

## Initialize a flag to track if all columns are present

all_present <- TRUE

## Check for missing columns in each capID

for (capID in names(sc)) {

missing_cols <- setdiff(colnames(s.split.unfilt[[capID]]), colnames(sc[[capID]]$adjusted_counts))

if (length(missing_cols) > 0) {

warning(paste0("The following columns are missing in adjusted_counts for ", capID, ": ", paste(missing_cols, collapse = ", ")))

all_present <- FALSE # Set flag to FALSE if any missing columns are found

} else {

message(paste0("All columns present for ", capID))

}

}

# Replace cellranger filtered counts with soupx adjusted counts of filtered cells

if (all_present) {

## Execute lapply only if all columns are present in all capIDs

message("All columns are present. Replacing cellranger filtered counts with soupx adjusted counts.")

s.split.unfilt <- lapply(names(s.split.unfilt), function(capID) {

s.split.unfilt[[capID]]@assays$RNA@layers$counts <-

sc[[capID]]$adjusted_counts[, colnames(s.split.unfilt[[capID]])]

return(s.split.unfilt[[capID]])

})

names(s.split.unfilt) <- names(sc)

} else {

# if all columns are not present, stop the script

stop("Not all columns are present. The replacement was not executed.")

}

akcorut · 2024-10-30T15:34:30Z

analysis_scripts/01-processing-template_ADT+soupX.runfile.Rmd

+
+write_rds(s.split.unfilt, here(paste0("saved_rds/report-", report_number), "s.split.unfilt-", getDate(), ".rds")) # for filtering plots
+write_rds(s.demux, here(paste0("saved_rds/report-", report_number), "s.demux-", getDate(), ".rds"))
+write_rds(s.filtADT.split, here(paste0("saved_rds/report-", report_number), "s.filtADT.split-", getDate(), ".rds"))
+write_rds(s.filtADT, here(paste0("saved_rds/report-", report_number), "s.filtADT-", getDate(), ".rds")) # For downstream processing
+```


There are bugs in write_rds. paste0() is missing when getDate() is used.

Suggested change

write_rds(s.split.unfilt, here(paste0("saved_rds/report-", report_number), "s.split.unfilt-", getDate(), ".rds")) # for filtering plots

write_rds(s.demux, here(paste0("saved_rds/report-", report_number), "s.demux-", getDate(), ".rds"))

write_rds(s.filtADT.split, here(paste0("saved_rds/report-", report_number), "s.filtADT.split-", getDate(), ".rds"))

write_rds(s.filtADT, here(paste0("saved_rds/report-", report_number), "s.filtADT-", getDate(), ".rds")) # For downstream processing

```

write_rds(singler_out, here(paste0("saved_rds/report-", report_number), paste0("singler_out-", getDate(), ".rds")))

write_rds(s.split.unfilt, here(paste0("saved_rds/report-", report_number), paste0("s.split.unfilt-", getDate(), ".rds"))) # for filtering plots

write_rds(s.demux, here(paste0("saved_rds/report-", report_number), paste0("s.demux-", getDate(), ".rds")))

write_rds(s.filtADT.split, here(paste0("saved_rds/report-", report_number), paste0("s.filtADT.split-", getDate(), ".rds")))

write_rds(s.filtADT, here(paste0("saved_rds/report-", report_number), paste0("s.filtADT-", getDate(), ".rds"))) # For downstream processing

…d validate column presence before replacing counts * fix: output file paths in section 6

micahpf added 17 commits August 12, 2024 18:32

Add .nfs* to .gitignore

9460460

Move deprecated or optional runfile and format files to helper_scripts

3469258

Renamed report 01-QC to 00-CR (because we rarely show this to clients…

615692b

… and it is just parrotting cellranger outputs) and updated formatting.

Init RNA-only example of 00-CR scripts

6b53e79

Delete deprecated example samplesheet

fd96d15

Reinstate config yaml because both reports 00 and 01 use it and it's …

054c353

…easier to switch between RNA vs RNA+HTO+ADT example datasets.

Update capture_cellranger_multi_qc_metrics() to handle cellranger m…

390fd25

…ulti or count output structure.

Fixed some typos and simplified file naming conventions to `template_…

f424682

…RNA` and `template_ADT` instead of the overly verbose `template_RNA+HTO+ADT`

Simplified renv_setup.R and re-initialized renv for template with S…

f58cfcf

…eurat 5.

Init RNA+ADT report 1 processing runfile. Overhauled the order of the…

2a81853

… workflow to allow us to plot the UMAPs before and after removing doublets in the doublet section of the formatted report.

Revise report 01-processing format file to work with overhaul of runf…

904ad04

…ile.

Remove deprecated analysis scripts

ac767d9

Populating 02-integration_cell_annotation-template_ADT runfile and …

5a480c5

…format file with `20-APC-subset+recluster` scripts from p21242_Satish_UM5_Analysis

Add date timestamp to rds filenames.

c0f316c

First pass at running WNN integration using the Seurat v5 IntegrateLa…

73b0d43

…yers() workflow used in Michael project.

Add capID to prefix of cell names so they don't collide when merging.…

6707b5e

… Also add sample metadata to seurat object.

Adding composite factor to DimPlots in integration script

8484950

micahpf requested a review from akcorut August 23, 2024 11:50

akcorut requested a review from gktharp1 October 24, 2024 15:28

akcorut reviewed Oct 29, 2024

View reviewed changes

akcorut reviewed Oct 30, 2024

View reviewed changes

akcorut added 3 commits November 25, 2024 14:33

Update renv.lock

266257e

Update renv_setup.R to install additional necessary packages

050a71c

fix: refactor processing script to prevent cell barcode collisions an…

201b577

…d validate column presence before replacing counts * fix: output file paths in section 6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Micah's RNA+ADT+HTO update using UM5 and Michael porjects as templates #14

Micah's RNA+ADT+HTO update using UM5 and Michael porjects as templates #14

micahpf commented Aug 23, 2024

akcorut Oct 29, 2024

akcorut Oct 30, 2024

akcorut Oct 30, 2024

akcorut Oct 30, 2024

	# Add capID to front of cell barcode to prevent collisions when merging across clusters
	sc <- lapply(sc, function(x) {
	x$adjusted_counts@Dimnames[[2]] <- paste0(capID, "_", x$adjusted_counts@Dimnames[[2]])
	return(x)
	})

-s.split.unfilt <- lapply(names(s.split.unfilt), function(capID) { #s.split.unfilt
-  s.split.unfilt[[capID]]@assays$RNA@layers$counts <-
-    sc[[capID]]$adjusted_counts[,colnames(s.split.unfilt[[capID]])]
-  return(s.split.unfilt[[capID]])
-})
+# Add capID to front of cell barcode to prevent collisions when merging across clusters
+for (capID in names(sc)) {
+  sc[[capID]]$adjusted_counts@Dimnames[[2]] <- paste0(capID, "_", sc[[capID]]$adjusted_counts@Dimnames[[2]])
+}
+## Initialize a flag to track if all columns are present
+all_present <- TRUE
+## Check for missing columns in each capID
+for (capID in names(sc)) {
+  missing_cols <- setdiff(colnames(s.split.unfilt[[capID]]), colnames(sc[[capID]]$adjusted_counts))
+  if (length(missing_cols) > 0) {
+    warning(paste0("The following columns are missing in adjusted_counts for ", capID, ": ", paste(missing_cols, collapse = ", ")))
+    all_present <- FALSE  # Set flag to FALSE if any missing columns are found
+  } else {
+    message(paste0("All columns present for ", capID))
+  }
+}
+# Replace cellranger filtered counts with soupx adjusted counts of filtered cells
+if (all_present) {
+  ## Execute lapply only if all columns are present in all capIDs
+  message("All columns are present. Replacing cellranger filtered counts with soupx adjusted counts.")
+  s.split.unfilt <- lapply(names(s.split.unfilt), function(capID) {
+    s.split.unfilt[[capID]]@assays$RNA@layers$counts <-
+      sc[[capID]]$adjusted_counts[, colnames(s.split.unfilt[[capID]])]
+    return(s.split.unfilt[[capID]])
+  })
+  names(s.split.unfilt) <- names(sc)
+} else {
+  # if all columns are not present, stop the script
+  stop("Not all columns are present. The replacement was not executed.")
+}

-write_rds(s.split.unfilt, here(paste0("saved_rds/report-", report_number), "s.split.unfilt-", getDate(), ".rds")) # for filtering plots
-write_rds(s.demux, here(paste0("saved_rds/report-", report_number), "s.demux-", getDate(), ".rds"))
-write_rds(s.filtADT.split, here(paste0("saved_rds/report-", report_number), "s.filtADT.split-", getDate(), ".rds"))
-write_rds(s.filtADT, here(paste0("saved_rds/report-", report_number), "s.filtADT-", getDate(), ".rds")) # For downstream processing
-```
+write_rds(singler_out, here(paste0("saved_rds/report-", report_number), paste0("singler_out-", getDate(), ".rds")))
+write_rds(s.split.unfilt, here(paste0("saved_rds/report-", report_number), paste0("s.split.unfilt-", getDate(), ".rds"))) # for filtering plots
+write_rds(s.demux, here(paste0("saved_rds/report-", report_number), paste0("s.demux-", getDate(), ".rds")))
+write_rds(s.filtADT.split, here(paste0("saved_rds/report-", report_number), paste0("s.filtADT.split-", getDate(), ".rds")))
+write_rds(s.filtADT, here(paste0("saved_rds/report-", report_number), paste0("s.filtADT-", getDate(), ".rds"))) # For downstream processing

Micah's RNA+ADT+HTO update using UM5 and Michael porjects as templates #14

Are you sure you want to change the base?

Micah's RNA+ADT+HTO update using UM5 and Michael porjects as templates #14

Conversation

micahpf commented Aug 23, 2024

akcorut Oct 29, 2024

Choose a reason for hiding this comment

akcorut Oct 30, 2024

Choose a reason for hiding this comment

akcorut Oct 30, 2024

Choose a reason for hiding this comment

akcorut Oct 30, 2024

Choose a reason for hiding this comment