Rmarkdown/methylation_array_qc_report.Rmd

---
title: '`r targets::tar_read(project)`'
subtitle: "Array Quality-Control"
author: '`r targets::tar_read(author)`'
institute: "_Inserm U1283 / CNRS UMR8199 / Institut Pasteur de Lille / Université de Lille_"
date: '`r format(Sys.time(), "%A, the %d of %B, %Y")`'
output:
  xaringan::moon_reader:
    self_contained: true
    css: [assets/umr1283_8199.css]
    includes:
      in_header: assets/_scripts.html
    nature:
      highlightStyle: github
      highlightLines: true
      ratio: "16:9"
      countIncrementalSlides: false
params:
  output_directory: "/EPIC_QC"
  csv_file: "sample_sheet_clean.csv"
  data_directory: "/idats"
  array: "Illumina EPIC"
  annotation: "ilm10b4.hg19"
  filter_snps: FALSE
  filter_non_cpg: TRUE
  filter_xy: TRUE
  filter_multihit: TRUE
  filter_beads: TRUE
  population: NULL
  bead_cutoff: 0.05
  detection_pvalues: 0.01
  filter_callrate: TRUE
  callrate_samples: 0.99
  callrate_probes: 1
  sex_threshold: NULL
  sex_colname: NULL
  norm_background: "oob"
  norm_dye: "RELIC"
  norm_quantile: "quantile1"
  cell_tissue: NULL
  pca_vars: !r c("Sample_Plate", "Sentrix_ID")
  max_labels: 15
---

class: part-slide

```{r setup, include = FALSE}
options("width" = 110)
options(htmltools.dir.version = FALSE)


### Load Packages ==================================================================================
suppressPackageStartupMessages({
  library(here)
  library(knitr)
  # library(ragg)
  library(ggplot2)
  library(ggtext)
  library(patchwork)
  library(data.table)
  library(gt)
  library(scales)

  library(targets)
})
```

```{r setup-knitr, include = FALSE}
knitr::opts_chunk$set(
  warning = FALSE,
  message = FALSE,
  eval = TRUE, # Default: TRUE
  include = TRUE, # Default: TRUE
  echo = FALSE, # Default: TRUE
  width = getOption("width"),
  comment = "#>",
  fig.align = "center",
  fig.width = 11.5, # Default: 7
  fig.height = 5.75,  # Default: 7
  dpi = 150, # Default: 72
  # dev = "ragg_png"
  dev = "svg"
)
```

```{r setup-ggplot2, file = "assets/setup-ggplot2.R", include = FALSE}
```

```{r setup-gt, file = "assets/setup-gt.R", include = FALSE}
```

```{r setup-functions, include = FALSE}
sub_chunk <- function(code, chunk_name = NULL, fig_width = 11.5, fig_height = 5.75) {
  sub_chunk_txt <- sprintf(
    "\n```{r %s, fig.width = %s, fig.height = %s, echo = FALSE}\n(%s)()\n```\n\n",
    chunk_name, fig_width, fig_height, paste0(deparse(function() code), collapse = "")
  )
  cat(knitr::knit(text = knitr::knit_expand(text = sub_chunk_txt), quiet = TRUE))
}
```

```{r tar-reads, include = FALSE}
phenotypes <- tar_read(ma_phenotypes)
sex_threshold <- tar_read(ma_sex_threshold)
pca_mset <- tar_read(ma_pca_mset_plots)
```

# Methods and Parameters

Array: ` `r params[["array"]]` `
Annotation package from Bioconductor: ` `r params[["annotation"]]` `

---

# Call Rate

* `filter_callrate`: ` `r params[["filter_callrate"]]` `

* The threshold for the detection p-values: ` `r percent_format(accuracy = 0.01, suffix = " %")(params[["detection_pvalues"]])` `

* The call rate threshold for samples: ` `r percent_format(accuracy = 0.01, suffix = " %")(params[["callrate_samples"]])` `  
    &rArr; Should samples with less than the specified call rate (` `r percent_format(accuracy = 0.01, suffix = " %")(params[["callrate_samples"]])` `) for detection p-values below ` α=`r params[["detection_pvalues"]]` ` be removed?

* The call rate threshold for probes: ` `r percent_format(accuracy = 0.01, suffix = " %")(params[["callrate_probes"]])` `  
    &rArr; Should probes with less than the specified call rate (` `r percent_format(accuracy = 0.01, suffix = " %")(params[["callrate_probes"]])` `) for detection p-values below ` α=`r params[["detection_pvalues"]]` ` be removed?

---

# Pre-Processing

* The method to estimate background normal distribution parameters: ` `r params[["norm_background"]]` `  
    &rArr; Method from [*ENmix*](https://doi.org/doi:10.18129/B9.bioc.ENmix)

* The dye bias correction: ` `r params[["norm_dye"]]` `  
    &rArr; Method from [*ENmix*](https://doi.org/doi:10.18129/B9.bioc.ENmix)

* The quantile normalisation: ` `r params[["norm_quantile"]]` `  
    &rArr; Method from [*ENmix*](https://doi.org/doi:10.18129/B9.bioc.ENmix)

---

# Probes Filtering

* `filter_snps`: ` `r params[["filter_snps"]]` `  
    &rArr; Should probes in which the probed CpG falls near a SNP be removed? ([Zhou et al., 2016](https://www.doi.org/10.1093/nar/gkw967))
    * Name of the ethnicity group: ` `r if (is.null(params[["population"]])) "Not defined" else params[["population"]]` `

* `filter_non_cpg`: ` `r params[["filter_non_cpg"]]` `  
    &rArr; Should non-cg probes be removed?

* `filter_xy`: ` `r params[["filter_xy"]]` `  
    &rArr; Should probes from X and Y chromosomes be removed?

* `filter_multihit`: ` `r params[["filter_multihit"]]` ` ([Nordlund et al., 2013](https://www.doi.org/10.1186/gb-2013-14-9-r105))  
    &rArr; Should probes which align to multiple locations be removed?

* `filter_beads`: ` `r params[["filter_beads"]]` `  
    &rArr; Should probes with less than three beadcount in at least ` `r percent(params[["bead_cutoff"]], suffix = " %")` ` of the samples be removed?

---

# Sex Check

* The threshold value to discrimate sex: ` `r if (is.null(params[["sex_threshold"]])) "auto" else params[["sex_threshold"]]` `  
    &rArr; Flag samples with sex discrepancy with the phenotype based on X/Y chromosomes methylation.

---

# Cell Composition

* The cell tissue: ` `r params[["cell_tissue"]]` `
    &rArr; Using a reference panel (*i.e.*,&nbsp;blood and cord blood) or `refFreeCellMix` method from [*RefFreeEWAS*](https://cran.r-project.org/package=RefFreeEWAS)

---

# Final Processing

* Probe design type bias correction: `rcp`  
    &rArr; Method from [*ENmix*](https://doi.org/doi:10.18129/B9.bioc.ENmix)

* Batch effect normalisation: `ComBat`  
    &rArr; Method from [*sva*](https://doi.org/doi:10.18129/B9.bioc.sva)

---
class: part-slide

# Quality Control

---

# Overview

```{r, results = "asis"}
cat(paste("+", tar_read(ma_data_idats)[["log"]]), sep = "\n")
```

```{css}
.sex-table { position: absolute; top: 12%; left: 75%;}
```

.sex-table[

```{r cohort-overview-table, results = "asis"}
if ("qc_observed_sex" %in% names(phenotypes)) {
  phenotypes[
    j = sex_fct := factor(qc_observed_sex, levels = c("Male", "Female", NA), labels = c("Male", "Female", "Unspecified"), exclude = NULL)
  ]

  phenotypes[j = .N, by = sex_fct][j = list(sex_fct, N)] %>%
    gt() %>%
    cols_align(align = "center") %>%
    tab_header(
      title = "Samples Available",
      subtitle = md("*EPIC Array*")
    ) %>%
    fmt_number(columns = "N", decimals = 0) %>%
    grand_summary_rows(
      columns = "N",
      fns = list(Total = ~ sum(.)),
      formatter = fmt_number,
      decimals = 0
    ) %>%
    cols_label(sex_fct = "Sex") %>%
    umr() %>%
    opt_row_striping() %>%
    opt_all_caps() %>%
    print()
}
```

]

---

# Sample Call Rate

.pull-left[

```{r callrate-samples-tab, results = "asis"}
callrate_thresholds <- sort(unique(c(params[["callrate_samples"]], 0.90, 0.95, 0.97, 0.98, 0.99, 1)), decreasing = TRUE)
data.frame(
  X1 = scales::percent_format(accuracy = 0.01, suffix = " %")(callrate_thresholds),
  X2 = rowSums(sapply(phenotypes[["call_rate"]], "<", callrate_thresholds)),
  X3 = scales::percent_format(accuracy = 0.01, suffix = " %")(
    rowSums(sapply(phenotypes[["call_rate"]], "<", callrate_thresholds)) / nrow(phenotypes)
  )
) %>%
  gt() %>%
  cols_align(align = "center") %>%
  tab_header("Number Of Samples To Exclude Based On Call Rate Thresholds") %>%
  tab_style(
    style = cell_fill(color = "#2222b2", alpha = 0.5),
    locations = cells_body(
      columns = everything(),
      rows = X1 == scales::percent_format(accuracy = 0.01, suffix = " %")(params[["callrate_samples"]])
    )
  ) %>%
  cols_label(
    X1 = "Call Rate Threshold",
    X2 = "Samples to Exclude (N)",
    X3 = "Samples to Exclude (%)"
  ) %>%
  umr() %>%
  opt_row_striping() %>%
  opt_all_caps() %>%
  print()
```

]

.pull-right[

```{r call-rate-samples-fig, fig.height = 7.5}
tar_read(ma_callrate_plot)
```

]

---

# Sex Check

```{r sex-check-fig}
if (is.null(params[["sex_colname"]])) {
  cat("No phenotypes for sex was provided.\n")
} else {
  tar_read(ma_sex_plot)
}
```

```{r discrepancy-samples-table, results = "asis"}
if (!is.null(params[["sex_colname"]]) & any(phenotypes[["qc_sex_discrepancy"]])) {
  cat("---\n\n# Sex Check\n\n")
  phenotypes[
    i = (qc_sex_discrepancy),
    j = list(Sample_Name, Sample_ID, qc_observed_sex, qc_predicted_sex)
  ][
    j = `:=`(
      qc_observed_sex = fifelse(is.na(qc_observed_sex), "Unspecified", as.character(qc_observed_sex)),
      qc_predicted_sex = fifelse(is.na(qc_predicted_sex), "Undetermined", as.character(qc_predicted_sex))
    )
  ] %>%
    gt() %>%
    cols_align(align = "center") %>%
    cols_label(
      Sample_Name = "Name",
      Sample_ID = "ID",
      qc_observed_sex = "Observed Sex",
      qc_predicted_sex = "Predicted Sex"
    ) %>%
    data_color(
      columns = c(qc_observed_sex, qc_predicted_sex),
      colors = col_factor(
        palette = c("#2222b2", "#b22222", "#808080", "#808080"),
        levels = c("Male", "Female", "Unspecified", "Undetermined")
      )
    ) %>%
    umr() %>%
    opt_row_striping() %>%
    opt_all_caps() %>%
    print()
}
```

---

# Cell Composition

```{r cell-composition-fig, results = "asis"}
if (
  is.null(params[["cell_tissue"]]) &
    any(grepl("^CellT_", names(phenotypes)))
) {
  cat("No cell tissue was provided or no available reference set (R packages).\n")
} else {
  p <- tar_read(ma_cell_plot)
  if (inherits(p[[1]]$theme$axis.text.y.right, "element_text")) {
    p[[1]] <- p[[1]] + theme(
      axis.text.y.right = ggtext::element_markdown()
    )
  }
  print(p)
}
```

---

# Principal Component Analysis: Association Tests

.pull-left[
## `r names(pca_mset)[[1]]`

```{r, fig.height = 10}
pca_mset[[1]][["p_association"]]
```

]

.pull-right[
## `r names(pca_mset)[[2]]`

```{r, fig.height = 10}
pca_mset[[2]][["p_association"]]
```

]

```{r planes, results = "asis"}
for (iplot in 2:unique(sapply(pca_mset, length))) {
  cat("\n---\n\n# Principal Component Analysis: Factorial Planes\n\n")
  cat(sprintf(".pull-left[\n## %s\n\n", names(pca_mset)[[1]]))
  sub_chunk(pca_mset[[1]][[iplot]], chunk_name = paste0("planes-1-", iplot), fig_width = 11.5, fig_height = 10)
  cat("\n\n]\n\n")
  cat(sprintf(".pull-right[\n## %s\n\n", names(pca_mset)[[2]]))
  sub_chunk(pca_mset[[2]][[iplot]], chunk_name = paste0("planes-2-", iplot), fig_width = 11.5, fig_height = 10)
  cat("\n\n]\n\n")
}
```

---
class: part-slide

.center[
<a href="https://www.good.cnrs.fr/" target="_blank"><img src="https://raw.githubusercontent.com/mcanouil/hex-stickers/master/SVG/umr1283_8199.svg" width = "200px"/></br>
<i style="font-size: 200%">www.good.cnrs.fr</i>
</a>
]