For any testing problem, there are 3 types of errors:

- A false positive (
`Type I error`

): Occurs when we declare an effect when none exists. - A false negative (
`Type II error`

): Occurs when we fail to declare a truly existing effect. - The correct rejection of \(H_0\) coupled with a wrong directional decision is denoted as
`Type III error`

.

Hypotheses | Not Rejected | Rejected | Total |
---|---|---|---|

True | $U$ | $V$ (Type I errors) | $m_0$ |

False | $T$ (Type II errors) | $S$ | $m-m_0$ |

Total | $W$ | $R$ | $m$ |

- Per-comparison error rate (
`PCER`

): The expected proportion of Type I errors among the \(m\) decisions. \[ PCER = \frac{E(V)}{m} \]

Hypotheses | Not Rejected | Rejected | Total |
---|---|---|---|

True | $U$ | $V$ (Type I errors) | $m_0$ |

False | $T$ (Type II errors) | $S$ | $m-m_0$ |

Total | $W$ | $R$ | $m$ |

- Family-wise error rate (
`FWER`

): The probability of committing at least one Type I error. \[ FWER = P(V\geq1)\]

Hypotheses | Not Rejected | Rejected | Total |
---|---|---|---|

True | $U$ | $V$ (Type I errors) | $m_0$ |

False | $T$ (Type II errors) | $S$ | $m-m_0$ |

Total | $W$ | $R$ | $m$ |

- False discovery rate (
`FDR`

): The expected proportion of`discoveries`

(significant results) that are actually false positive. \[ FDR = E\left(\frac{V}{R}\right)\] In general, \[ PCER \leq FDR \leq FWER \]

For any of the error concepts above:

- Error control is
`weak`

: if the Type I error rate is controlled under the`global null hypothesis`

that all the null hypotheses \(H_1,...,H_m\) are true.- e.g.
`SNK`

,`Duncan`

, and`LSD`

control`FWER`

in the weak sense.

- e.g.
- Error control is
`strong`

: if Type I error is controlled under any partial configuration of true and false null hypotheses.`TukeyHSD`

,`Bonferroni`

,`Holm`

,`Hochberg`

and`Hommel`

control the`FWER`

in the`strong`

sense.`BH`

and`BY`

methods control the`FDR`

.

`Multiple comparisons procedures (MCP)`

: Any statistical test procedure designed to account for and properly control the multiplicity effect through a suitable error rate (e.g.`FWER`

,`FDR`

).

- More
`conservative`

: Generates larger P-values (Hence lead to a smaller number of rejected hypotheses).- while reduces the number of false positives, it also reduces the number of true discoveries

- More
`powerful`

: A`MCP`

is more powerful than a competing procedure if it rejects hypotheses more often than its competitor (assuming that both methods use the same α)- While more likely to identify true positives, it will also increase the number of false positives.

- Goal of a
`MCP`

: Find the most powerful method possible that is subject to global (family-wise) Type I error control.

`MCP`

can be classified into 2 categories:

`Single-step`

tests:- The rejection or non-rejection of a null hypothesis does not take the decision of any other hypothesis into account.
- The order of testing doesn't matter (e.g.
`Bonferroni`

and`TukeyHSD`

).

`Stepwise`

tests:- The rejection or non-rejection of a null hypothesis may depend on the decision of other hypotheses (e.g.
`Holm`

test, which is an extension of Bonferroni).

- The rejection or non-rejection of a null hypothesis may depend on the decision of other hypotheses (e.g.
`Stepwise`

extension of`single-step`

tests are often available and lead to more`powerful`

methods. This comes at the cost of losing the ability to construct confidence intervals that corresponds to your tests.

- All
`pairwise`

comparisons in the`ANOVA`

- All
`pairwise`

comparisons with the`control`

- Multiple comparisons with the (unknown)
`best`

: compare treatment means with the unknown`best`

or`worst`

. - Comparisons with the average mean (
`Analysis of Means`

or`ANOM`

): Identify treatments that differ significantly from the overall average. `Dose-response`

contrasts: To find the minimum effective dose.

In the software, we mainly focus on the first two types.

- The general recommendation (not included in the software) is to use a `correlation-based logically constrained`

method for stepwise comparisons (e.g. Shaffer-Royen method).

- `LSD`

should only be used when we have 3 treatment groups (this is when the `FWER`

is controlled in the strong sense.)

- P. Westfall, R. Tobias, R. Wolfinger (2011). Multiple comparisons and multiple tests using SAS, second edition.
- F. Bretz, T. Hothorn, P. Westfall (2011). Multiple comparisons using R.
`knitr`

: Yihui Xie. Elegant, flexible and fast dynamic report generation with R`shiny`

: Rstudio and Inc. Web Application framework for R.`Markdown`

. A great place to learn it is here`slidify`

: Ramnath Vaidyanathan. Create elegant, interactive presentations from R with Slidify.- Several examples are available on Ramnath Youtube channel.