Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Micah's RNA+ADT+HTO update using UM5 and Michael porjects as templates #14

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

micahpf
Copy link
Contributor

@micahpf micahpf commented Aug 23, 2024

Note that I wasn't able to finish all the planned changes before I left Emory.

I left off partway through the integration and annotation runfile. You can use the UM5 subsetting runfiles/formatfiles and michael integration+annotation runfiles/formatfiles as references to finish that.

The next step would be to add the pseudobulk script from the Michael project. Fingers crossed this slots in seamlessly with little changes required.

Also note that there is a bug in the processing runfile at seurat object creation due to package conflicts. I think this could be resolved by updating R to the latest version with a bioconductor base docker image.

Good luck!

micahpf added 17 commits August 12, 2024 18:32
… and it is just parrotting cellranger outputs) and updated formatting.
…easier to switch between RNA vs RNA+HTO+ADT example datasets.
…RNA` and `template_ADT` instead of the overly verbose `template_RNA+HTO+ADT`
… workflow to allow us to plot the UMAPs before and after removing doublets in the doublet section of the formatted report.
…format file with `20-APC-subset+recluster` scripts from p21242_Satish_UM5_Analysis
@micahpf micahpf requested a review from akcorut August 23, 2024 11:50
@akcorut akcorut requested a review from gktharp1 October 24, 2024 15:28
Comment on lines 520 to 524
s.split.unfilt <- lapply(names(s.split.unfilt), function(capID) { #s.split.unfilt
s.split.unfilt[[capID]]@assays$RNA@layers$counts <-
sc[[capID]]$adjusted_counts[,colnames(s.split.unfilt[[capID]])]
return(s.split.unfilt[[capID]])
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails with Error in .subscript.2ary(x, , j, drop = TRUE) : subscript out of bounds

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Add capID to front of cell barcode to prevent collisions when merging across clusters
sc <- lapply(sc, function(x) {
x$adjusted_counts@Dimnames[[2]] <- paste0(capID, "_", x$adjusted_counts@Dimnames[[2]])
return(x)
})

This error seems to be caused by a bug in the above part where all capIDs are replaced by the last capID.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below is a fix for this issue.

Suggested change
s.split.unfilt <- lapply(names(s.split.unfilt), function(capID) { #s.split.unfilt
s.split.unfilt[[capID]]@assays$RNA@layers$counts <-
sc[[capID]]$adjusted_counts[,colnames(s.split.unfilt[[capID]])]
return(s.split.unfilt[[capID]])
})
# Add capID to front of cell barcode to prevent collisions when merging across clusters
for (capID in names(sc)) {
sc[[capID]]$adjusted_counts@Dimnames[[2]] <- paste0(capID, "_", sc[[capID]]$adjusted_counts@Dimnames[[2]])
}
## Initialize a flag to track if all columns are present
all_present <- TRUE
## Check for missing columns in each capID
for (capID in names(sc)) {
missing_cols <- setdiff(colnames(s.split.unfilt[[capID]]), colnames(sc[[capID]]$adjusted_counts))
if (length(missing_cols) > 0) {
warning(paste0("The following columns are missing in adjusted_counts for ", capID, ": ", paste(missing_cols, collapse = ", ")))
all_present <- FALSE # Set flag to FALSE if any missing columns are found
} else {
message(paste0("All columns present for ", capID))
}
}
# Replace cellranger filtered counts with soupx adjusted counts of filtered cells
if (all_present) {
## Execute lapply only if all columns are present in all capIDs
message("All columns are present. Replacing cellranger filtered counts with soupx adjusted counts.")
s.split.unfilt <- lapply(names(s.split.unfilt), function(capID) {
s.split.unfilt[[capID]]@assays$RNA@layers$counts <-
sc[[capID]]$adjusted_counts[, colnames(s.split.unfilt[[capID]])]
return(s.split.unfilt[[capID]])
})
names(s.split.unfilt) <- names(sc)
} else {
# if all columns are not present, stop the script
stop("Not all columns are present. The replacement was not executed.")
}

Comment on lines 1112 to 1117

write_rds(s.split.unfilt, here(paste0("saved_rds/report-", report_number), "s.split.unfilt-", getDate(), ".rds")) # for filtering plots
write_rds(s.demux, here(paste0("saved_rds/report-", report_number), "s.demux-", getDate(), ".rds"))
write_rds(s.filtADT.split, here(paste0("saved_rds/report-", report_number), "s.filtADT.split-", getDate(), ".rds"))
write_rds(s.filtADT, here(paste0("saved_rds/report-", report_number), "s.filtADT-", getDate(), ".rds")) # For downstream processing
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are bugs in write_rds. paste0() is missing when getDate() is used.

Suggested change
write_rds(s.split.unfilt, here(paste0("saved_rds/report-", report_number), "s.split.unfilt-", getDate(), ".rds")) # for filtering plots
write_rds(s.demux, here(paste0("saved_rds/report-", report_number), "s.demux-", getDate(), ".rds"))
write_rds(s.filtADT.split, here(paste0("saved_rds/report-", report_number), "s.filtADT.split-", getDate(), ".rds"))
write_rds(s.filtADT, here(paste0("saved_rds/report-", report_number), "s.filtADT-", getDate(), ".rds")) # For downstream processing
```
write_rds(singler_out, here(paste0("saved_rds/report-", report_number), paste0("singler_out-", getDate(), ".rds")))
write_rds(s.split.unfilt, here(paste0("saved_rds/report-", report_number), paste0("s.split.unfilt-", getDate(), ".rds"))) # for filtering plots
write_rds(s.demux, here(paste0("saved_rds/report-", report_number), paste0("s.demux-", getDate(), ".rds")))
write_rds(s.filtADT.split, here(paste0("saved_rds/report-", report_number), paste0("s.filtADT.split-", getDate(), ".rds")))
write_rds(s.filtADT, here(paste0("saved_rds/report-", report_number), paste0("s.filtADT-", getDate(), ".rds"))) # For downstream processing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants