Skip to content

Commit

Permalink
add notes regarding the analysis reference
Browse files Browse the repository at this point in the history
  • Loading branch information
Carol-seven committed Jul 9, 2024
1 parent ba09582 commit 206c495
Show file tree
Hide file tree
Showing 8 changed files with 234 additions and 134 deletions.
2 changes: 1 addition & 1 deletion docs/articles/scaffold.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

246 changes: 156 additions & 90 deletions docs/articles/usage_template.html

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ pkgdown_sha: ~
articles:
scaffold: scaffold.html
usage_template: usage_template.html
last_built: 2024-07-07T16:34Z
last_built: 2024-07-09T02:13Z
urls:
reference: https://uconn-scs.github.io/msDiaLogue/reference
article: https://uconn-scs.github.io/msDiaLogue/articles
Expand Down
2 changes: 1 addition & 1 deletion docs/search.json

Large diffs are not rendered by default.

116 changes: 75 additions & 41 deletions vignettes/usage_template.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -219,26 +219,31 @@ knitr::kable(
**Extension**

Besides protein names, the function `filterProtein()` provides a similar function to
filter proteins by additional protein information: gene, accession, and description.
filter proteins by additional protein information.

+ For **Spectronaut**: "PG.Genes", "PG.ProteinAccessions", "PG.ProteinDescriptions", and
"PG.ProteinNames".

+ For **Scaffold**: "ProteinDescriptions", "AccessionNumber", and "AlternateID".

```{r eval=FALSE}
filterProtein(dataTran, proteinInformation = "preprocess_protein_information.csv",
text = c("Ras-related protein Rab-3D", "Alcohol dehydrogenase 1"),
by = "description",
by = "PG.ProteinDescriptions",
removeList = FALSE)
```

where `proteinInformation` is the file name for protein information, automatically
generated by `preprocessing()`. In this case, the proteins with descriptions
"Ras-related protein Rab-3D" or "Alcohol dehydrogenase 1" will be kept. Note that the
search value `text` is used for exact equality search.
generated by `preprocessing()`. In this case, the proteins whose `"PG.ProteinDescriptions"`
match with "Ras-related protein Rab-3D" or "Alcohol dehydrogenase 1" will be kept.
Note that the search value `text` is used for exact equality search.

<div style="overflow-x: auto;">
```{r echo=FALSE}
knitr::kable(
filterProtein(dataTran, proteinInformation = "preprocess_protein_information.csv",
text = c("Ras-related protein Rab-3D", "Alcohol dehydrogenase 1"),
by = "description",
by = "PG.ProteinDescriptions",
removeList = FALSE))
```
</div>
Expand Down Expand Up @@ -461,49 +466,65 @@ knitr::kable(dataSumm)

The column "Stat" in the generated result includes the following statistics:

+ n: number.
+ mean: mean.
+ sd: standard deviation.
+ median: median.
+ trimmed: trimmed mean with a trim of 0.1.
+ mad: median absolute deviation (from the median).
+ min: minimum.
+ max: maximum.
+ range: the difference between the maximum and minimum value.
+ skew: skewness.
+ kurtosis: kurtosis.
+ se: standard error.
+ n: Number.
+ mean: Mean.
+ sd: Standard deviation.
+ median: Median.
+ trimmed: Trimmed mean with a trim of 0.1.
+ mad: Median absolute deviation (from the median).
+ min: Minimum.
+ max: Maximum.
+ range: The difference between the maximum and minimum value.
+ skew: Skewness.
+ kurtosis: Kurtosis.
+ se: Standard error.

## Analysis

The function `analyze()` calculates the results that can be used in subsequent
visualizations. If more than two conditions exist in the data, precisely two conditions
for comparison must be specified via the argument `conditions`.
visualizations.

<div class="note">
**Note:** The following listed analysis compare data under two conditions. The **order**
of `conditions` will affect downstream analysis, as the **second condition** serves as
the reference of comparison.

+ If only two conditions exist in the data and `conditions` is not specified, `conditions`
will automatically be generated by sorting the unique values alphabetically and in
ascending order.

+ If more than two conditions exist in the data, precisely two conditions for comparison
must be specified via the argument `conditions`.
</div>

```{r}
cond <- c("50fmol", "100fmol")
cond <- c("100fmol", "50fmol")
```

### Student's t-test

The Student's t-test is used to compare the means between two conditions for each protein,
reporting both the difference in means between the conditions (calculated as Condition 1 -
Condition 2) and the P-value of the test.
reporting both the difference in means between the conditions and the P-value of the test.

<div class="note">
**Note:** The difference is calculated by subtracting the mean of the second condition
from the mean of the first condition (condition 1 - Condition 2). </div>

```{r}
anlys_t <- analyze(dataImput, conditions = cond, testType = "t-test")
```

Oops! The warning message shows "Data are essentially constant," which means that the data
contain proteins with the same value in all samples. In this case, the P-value of t-test
returns NA.

<div style="overflow-x: auto;">
```{r echo=FALSE}
knitr::kable(anlys_t)
```
</div>

<div class="note">
**Note:** In the Student's t-test, a warning message might appear, stating
"**Data are essentially constant**," which means that the data contain proteins with the
same value in all samples. In this case, the P-value of t-test returns NA. </div>


### Moderated t-test

Expand All @@ -516,29 +537,36 @@ from all the chosen proteins to calculate variance.
anlys_mod.t <- analyze(dataImput, conditions = cond, testType = "mod.t-test")
```

In the moderated t-test, a warning message might occur stating, "Zero sample variances
detected, have been offset away from zero." This warning corresponds to examples of
proteins that exhibited identical quant values, either pre- or post-imputation, and
therefore no variance is present across conditions for those proteins. This does not
impede downstream analysis; it merely serves to alert users to its occurrence.

<!-- This just means that for at least one protein the log ratio is identical for all samples. -->
<!-- Since this will give a zero variance (which will end up in the denominator of your -->
<!-- statistic and could possibly result in an infinite value for your test statistic) it has -->
<!-- been offset to a small value to prevent that possibility. -->

<div style="overflow-x: auto;">
```{r echo=FALSE}
knitr::kable(anlys_mod.t)
```
</div>

<div class="note">
**Note:** In the moderated t-test, a warning message might occur stating,
"**Zero sample variances detected, have been offset away from zero.**"
This warning corresponds to examples of proteins that exhibited identical quant values,
either pre- or post-imputation, and therefore no variance is present across conditions
for those proteins. This does not impede downstream analysis; it merely serves to alert
users to its occurrence. </div>

<!-- This just means that for at least one protein the log ratio is identical for all samples. -->
<!-- Since this will give a zero variance (which will end up in the denominator of your -->
<!-- statistic and could possibly result in an infinite value for your test statistic) it has -->
<!-- been offset to a small value to prevent that possibility. -->


### MA

The result of `testType = "MA"` is to generate the data for plotting an MA plot, which
represents the protein-wise averages within each condition.

<div class="note">
**Note:** The rows of the output are ordered by conditions, impacting the subsequent
MA plot visualization. Specifically, the first row represents the protein-wise average of
the first condition, and the second row represents the second condition. </div>

```{r}
anlys_MA <- analyze(dataImput, conditions = cond, testType = "MA")
```
Expand Down Expand Up @@ -612,15 +640,21 @@ $$A = \frac{1}{2}log_2(XY) = \frac{1}{2}\left[log_2(X)+log_2(Y)\right]$$
Most proteins are expected to exhibit little variation, leading to the majority of points
concentrating around the line M = 0 (indicating no difference between group means).

<div class="note">
**Note:** Again, the **order** of `conditions` in the `analyze()` will determine
how the MA plot is visualized. The second row of `anlys_MA` acts as the comparison
reference: the first and second rows refer to variables $log_2 X$ and $log_2 Y$,
respectively. </div>

```{r}
visualize(anlys_MA, graphType = "MA", M.thres = 1, transformLabel = "Log2")
```

where `M.thres = 1` means the M thresholds are set to −1 and 1. The scatters are split
into three parts: significant up (M > 1), no significant (-1 $\leq$ M $\leq$ 1), and
significant down (M < -1). And `transformLabel = "Log2"` is used to label the title.
Additionally, the warning message "Removed 16 rows containing missing values" indicates
that there are 16 proteins with no significance.
significant down (M < -1). And `transformLabel = "Log2"` is used to prefix the title,
x-axis, and y-axis labels. Additionally, the warning message "Removed 16 rows containing
missing values" indicates that there are 16 proteins with no significance.


### Normalize
Expand Down

0 comments on commit 206c495

Please sign in to comment.