Skip to content

Commit

Permalink
Add info on library size filtering (#649)
Browse files Browse the repository at this point in the history
  • Loading branch information
TuomasBorman authored Dec 13, 2024
1 parent 42bae04 commit 0670fb8
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 7 deletions.
2 changes: 2 additions & 0 deletions inst/pages/quality_control.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -345,6 +345,8 @@ plotColData(tse,"sum","SampleType", colour_by = "SampleType") +
theme(axis.text.x = element_text(angle = 45, hjust=1))
```

If you want to subset data based on library size, see [@sec-subset-library-size].

In addition, data can be rarefied with
[rarefyAssay](https://microbiome.github.io/mia/reference/rarefyAssay.html),
which normalizes the samples to an equal number of reads.
Expand Down
16 changes: 9 additions & 7 deletions inst/pages/subsetting.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -178,11 +178,14 @@ If a study was to consider and quantify the presence of Actinobacteria
as well as Chlamydiae in different sites of the human body,
`tse_sub` might be a suitable subset to start with.

## Filtering out empty samples

Sometimes data might contain, e.g., features that are not present in any of
the samples. This can occur, for example, after data subsetting. In certain
analyses, we might want to remove those instances. In this example, we are
## Filtering based on library size {#sec-subset-library-size}

As a preprocessing step, one might want to remove samples that do not
exceed certain library size, i.e., total number of counts. Additionally,
sometimes data might contain, samples that do not contain any of the
features present in the dataset. This can occur, for example, after data
subsetting. To focus only samples containing sufficient information, we
might want to remove those instances. In this example, we are
interested only those features that belong to Species _Achromatiumoxaliferum_.

```{r}
Expand All @@ -193,8 +196,7 @@ ind[is.na(ind)] <- FALSE
tse_sub <- tse[ind, ]
```

Then we can calculate, how many times each feature was detected in each
samples.
Then we can calculate the total number of counts in each sample.

```{r}
#| label: subset_empty
Expand Down

0 comments on commit 0670fb8

Please sign in to comment.