add notes regarding the analysis reference

uconn-scs · Jul 9, 2024 · 206c495 · 206c495
1 parent ba09582
commit 206c495
Show file tree

Hide file tree

Showing 8 changed files with 234 additions and 134 deletions.
diff --git a/docs/articles/scaffold.html b/docs/articles/scaffold.html
diff --git a/docs/articles/usage_template.html b/docs/articles/usage_template.html
diff --git a/docs/articles/usage_template_files/figure-html/unnamed-chunk-33-1.png b/docs/articles/usage_template_files/figure-html/unnamed-chunk-33-1.png
diff --git a/docs/articles/usage_template_files/figure-html/unnamed-chunk-41-1.png b/docs/articles/usage_template_files/figure-html/unnamed-chunk-41-1.png
diff --git a/docs/articles/usage_template_files/figure-html/unnamed-chunk-44-1.png b/docs/articles/usage_template_files/figure-html/unnamed-chunk-44-1.png
diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml
@@ -4,7 +4,7 @@ pkgdown_sha: ~
 articles:
   scaffold: scaffold.html
   usage_template: usage_template.html
-last_built: 2024-07-07T16:34Z
+last_built: 2024-07-09T02:13Z
 urls:
   reference: https://uconn-scs.github.io/msDiaLogue/reference
   article: https://uconn-scs.github.io/msDiaLogue/articles

diff --git a/docs/search.json b/docs/search.json
diff --git a/vignettes/usage_template.Rmd b/vignettes/usage_template.Rmd
@@ -219,26 +219,31 @@ knitr::kable(
 **Extension**
 
 Besides protein names, the function `filterProtein()` provides a similar function to
-filter proteins by additional protein information: gene, accession, and description.
+filter proteins by additional protein information.
+
++ For **Spectronaut**: "PG.Genes", "PG.ProteinAccessions", "PG.ProteinDescriptions", and
+"PG.ProteinNames".
+
++ For **Scaffold**: "ProteinDescriptions", "AccessionNumber", and "AlternateID".
 
 ```{r eval=FALSE}
 filterProtein(dataTran, proteinInformation = "preprocess_protein_information.csv",
               text = c("Ras-related protein Rab-3D", "Alcohol dehydrogenase 1"),
-              by = "description",
+              by = "PG.ProteinDescriptions",
               removeList = FALSE)
 ```
 
 where `proteinInformation` is the file name for protein information, automatically
-generated by `preprocessing()`. In this case, the proteins with descriptions
-"Ras-related protein Rab-3D" or "Alcohol dehydrogenase 1" will be kept. Note that the
-search value `text` is used for exact equality search.
+generated by `preprocessing()`. In this case, the proteins whose `"PG.ProteinDescriptions"`
+match with "Ras-related protein Rab-3D" or "Alcohol dehydrogenase 1" will be kept.
+Note that the search value `text` is used for exact equality search.
 
 <div style="overflow-x: auto;">
 ```{r echo=FALSE}
 knitr::kable(
   filterProtein(dataTran, proteinInformation = "preprocess_protein_information.csv",
                 text = c("Ras-related protein Rab-3D", "Alcohol dehydrogenase 1"),
-                by = "description",
+                by = "PG.ProteinDescriptions",
                 removeList = FALSE))
 ```
 </div>
@@ -461,49 +466,65 @@ knitr::kable(dataSumm)
 
 The column "Stat" in the generated result includes the following statistics:
 
-+ n: number.
-+ mean: mean.
-+ sd: standard deviation.
-+ median: median.
-+ trimmed: trimmed mean with a trim of 0.1.
-+ mad: median absolute deviation (from the median).
-+ min: minimum.
-+ max: maximum.
-+ range: the difference between the maximum and minimum value.
-+ skew: skewness.
-+ kurtosis: kurtosis.
-+ se: standard error.
++ n: Number.
++ mean: Mean.
++ sd: Standard deviation.
++ median: Median.
++ trimmed: Trimmed mean with a trim of 0.1.
++ mad: Median absolute deviation (from the median).
++ min: Minimum.
++ max: Maximum.
++ range: The difference between the maximum and minimum value.
++ skew: Skewness.
++ kurtosis: Kurtosis.
++ se: Standard error.
 
 ## Analysis
 
 The function `analyze()` calculates the results that can be used in subsequent
-visualizations. If more than two conditions exist in the data, precisely two conditions
-for comparison must be specified via the argument `conditions`.
+visualizations.
+
+<div class="note">
+**Note:** The following listed analysis compare data under two conditions. The **order**
+of `conditions` will affect downstream analysis, as the **second condition** serves as
+the reference of comparison.
+
++ If only two conditions exist in the data and `conditions` is not specified, `conditions`
+will automatically be generated by sorting the unique values alphabetically and in
+ascending order.
+
++ If more than two conditions exist in the data, precisely two conditions for comparison
+must be specified via the argument `conditions`.
+</div>
 
 ```{r}
-cond <- c("50fmol", "100fmol")
+cond <- c("100fmol", "50fmol")
 ```
 
 ### Student's t-test
 
 The Student's t-test is used to compare the means between two conditions for each protein,
-reporting both the difference in means between the conditions (calculated as Condition 1 -
-Condition 2) and the P-value of the test.
+reporting both the difference in means between the conditions and the P-value of the test.
+
+<div class="note">
+**Note:** The difference is calculated by subtracting the mean of the second condition
+from the mean of the first condition (condition 1 - Condition 2). </div>
 
 ```{r}
 anlys_t <- analyze(dataImput, conditions = cond, testType = "t-test")
 ```
 
-Oops! The warning message shows "Data are essentially constant," which means that the data
-contain proteins with the same value in all samples. In this case, the P-value of t-test
-returns NA.
-
 <div style="overflow-x: auto;">
 ```{r echo=FALSE}
 knitr::kable(anlys_t)
 ```
 </div>
 
+<div class="note">
+**Note:** In the Student's t-test, a warning message might appear, stating
+"**Data are essentially constant**," which means that the data contain proteins with the
+same value in all samples. In this case, the P-value of t-test returns NA. </div>
+
 
 ### Moderated t-test
 
@@ -516,29 +537,36 @@ from all the chosen proteins to calculate variance.
 anlys_mod.t <- analyze(dataImput, conditions = cond, testType = "mod.t-test")
 ```
 
-In the moderated t-test, a warning message might occur stating, "Zero sample variances
-detected, have been offset away from zero." This warning corresponds to examples of
-proteins that exhibited identical quant values, either pre- or post-imputation, and
-therefore no variance is present across conditions for those proteins. This does not
-impede downstream analysis; it merely serves to alert users to its occurrence.
-
-<!-- This just means that for at least one protein the log ratio is identical for all samples. -->
-<!-- Since this will give a zero variance (which will end up in the denominator of your -->
-<!-- statistic and could possibly result in an infinite value for your test statistic) it has -->
-<!-- been offset to a small value to prevent that possibility. -->
-
 <div style="overflow-x: auto;">
 ```{r echo=FALSE}
 knitr::kable(anlys_mod.t)
 ```
 </div>
 
+<div class="note">
+**Note:** In the moderated t-test, a warning message might occur stating,
+"**Zero sample variances detected, have been offset away from zero.**"
+This warning corresponds to examples of proteins that exhibited identical quant values,
+either pre- or post-imputation, and therefore no variance is present across conditions
+for those proteins. This does not impede downstream analysis; it merely serves to alert
+users to its occurrence. </div>
+
+<!-- This just means that for at least one protein the log ratio is identical for all samples. -->
+<!-- Since this will give a zero variance (which will end up in the denominator of your -->
+<!-- statistic and could possibly result in an infinite value for your test statistic) it has -->
+<!-- been offset to a small value to prevent that possibility. -->
+
 
 ### MA
 
 The result of `testType = "MA"` is to generate the data for plotting an MA plot, which
 represents the protein-wise averages within each condition.
 
+<div class="note">
+**Note:** The rows of the output are ordered by conditions, impacting the subsequent
+MA plot visualization. Specifically, the first row represents the protein-wise average of
+the first condition, and the second row represents the second condition. </div>
+
 ```{r}
 anlys_MA <- analyze(dataImput, conditions = cond, testType = "MA")
 ```
@@ -612,15 +640,21 @@ $$A = \frac{1}{2}log_2(XY) = \frac{1}{2}\left[log_2(X)+log_2(Y)\right]$$
 Most proteins are expected to exhibit little variation, leading to the majority of points
 concentrating around the line M = 0 (indicating no difference between group means).
 
+<div class="note">
+**Note:** Again, the **order** of `conditions` in the `analyze()` will determine
+how the MA plot is visualized. The second row of `anlys_MA` acts as the comparison
+reference: the first and second rows refer to variables $log_2 X$ and $log_2 Y$,
+respectively. </div>
+
 ```{r}
 visualize(anlys_MA, graphType = "MA", M.thres = 1, transformLabel = "Log2")
 ```
 
 where `M.thres = 1` means the M thresholds are set to −1 and 1. The scatters are split
 into three parts: significant up (M > 1), no significant (-1 $\leq$ M $\leq$ 1), and
-significant down (M < -1). And `transformLabel = "Log2"` is used to label the title.
-Additionally, the warning message "Removed 16 rows containing missing values" indicates
-that there are 16 proteins with no significance.
+significant down (M < -1). And `transformLabel = "Log2"` is used to prefix the title,
+x-axis, and y-axis labels. Additionally, the warning message "Removed 16 rows containing
+missing values" indicates that there are 16 proteins with no significance.
 
 
 ### Normalize