Add info on library size filtering (#649)

microbiome · Dec 13, 2024 · 0670fb8 · 0670fb8
1 parent 42bae04
commit 0670fb8
Show file tree

Hide file tree

Showing 2 changed files with 11 additions and 7 deletions.
diff --git a/inst/pages/quality_control.qmd b/inst/pages/quality_control.qmd
@@ -345,6 +345,8 @@ plotColData(tse,"sum","SampleType", colour_by = "SampleType") +
     theme(axis.text.x = element_text(angle = 45, hjust=1))
 ```
 
+If you want to subset data based on library size, see [@sec-subset-library-size].
+
 In addition, data can be rarefied with
 [rarefyAssay](https://microbiome.github.io/mia/reference/rarefyAssay.html),
 which normalizes the samples to an equal number of reads.

diff --git a/inst/pages/subsetting.qmd b/inst/pages/subsetting.qmd
@@ -178,11 +178,14 @@ If a study was to consider and quantify the presence of Actinobacteria
 as well as Chlamydiae in different sites of the human body,
 `tse_sub` might be a suitable subset to start with.
 
-## Filtering out empty samples
-
-Sometimes data might contain, e.g., features that are not present in any of
-the samples. This can occur, for example, after data subsetting. In certain
-analyses, we might want to remove those instances. In this example, we are
+## Filtering based on library size {#sec-subset-library-size}
+
+As a preprocessing step, one might want to remove samples that do not
+exceed certain library size, i.e., total number of counts. Additionally,
+sometimes data might contain, samples that do not contain any of the
+features present in the dataset. This can occur, for example, after data
+subsetting. To focus only samples containing sufficient information, we
+might want to remove those instances. In this example, we are
 interested only those features that belong to Species _Achromatiumoxaliferum_.
 
 ```{r}
@@ -193,8 +196,7 @@ ind[is.na(ind)] <- FALSE
 tse_sub <- tse[ind, ]
 ```
 
-Then we can calculate, how many times each feature was detected in each
-samples.
+Then we can calculate the total number of counts in each sample.
 
 ```{r}
 #| label: subset_empty