## General concepts

For any testing problem, there are 3 types of errors:

• A false positive (Type I error): Occurs when we declare an effect when none exists.
• A false negative (Type II error): Occurs when we fail to declare a truly existing effect.
• The correct rejection of $H_0$ coupled with a wrong directional decision is denoted as Type III error.

## Type I Errors in Multiple Testing

Type I and II errors in multiple hypotheses testing.
Hypotheses Not Rejected Rejected Total
True $U$ $V$ (Type I errors) $m_0$
False $T$ (Type II errors) $S$ $m-m_0$
Total $W$ $R$ $m$

• Per-comparison error rate (PCER): The expected proportion of Type I errors among the $m$ decisions. $PCER = \frac{E(V)}{m}$
Type I and II errors in multiple hypotheses testing.
Hypotheses Not Rejected Rejected Total
True $U$ $V$ (Type I errors) $m_0$
False $T$ (Type II errors) $S$ $m-m_0$
Total $W$ $R$ $m$

• Family-wise error rate (FWER): The probability of committing at least one Type I error. $FWER = P(V\geq1)$
Type I and II errors in multiple hypotheses testing.
Hypotheses Not Rejected Rejected Total
True $U$ $V$ (Type I errors) $m_0$
False $T$ (Type II errors) $S$ $m-m_0$
Total $W$ $R$ $m$

• False discovery rate (FDR): The expected proportion of discoveries (significant results) that are actually false positive. $FDR = E\left(\frac{V}{R}\right)$ In general, $PCER \leq FDR \leq FWER$

## Weak vs strong error control

For any of the error concepts above:

• Error control is weak: if the Type I error rate is controlled under the global null hypothesis that all the null hypotheses $H_1,...,H_m$ are true.
• e.g. SNK, Duncan, and LSD control FWER in the weak sense.
• Error control is strong: if Type I error is controlled under any partial configuration of true and false null hypotheses.
• TukeyHSD, Bonferroni, Holm, Hochberg and Hommel control the FWER in the strong sense.
• BH and BY methods control the FDR.

## Multiple Comparisons Procedures

• Multiple comparisons procedures (MCP): Any statistical test procedure designed to account for and properly control the multiplicity effect through a suitable error rate (e.g. FWER, FDR).

## The Dilemma

• More conservative: Generates larger P-values (Hence lead to a smaller number of rejected hypotheses).
• while reduces the number of false positives, it also reduces the number of true discoveries
• More powerful: A MCP is more powerful than a competing procedure if it rejects hypotheses more often than its competitor (assuming that both methods use the same α)
• While more likely to identify true positives, it will also increase the number of false positives.
• Goal of a MCP: Find the most powerful method possible that is subject to global (family-wise) Type I error control.

## Single Step vs Stepwise tests

MCP can be classified into 2 categories:

• Single-step tests:

• The rejection or non-rejection of a null hypothesis does not take the decision of any other hypothesis into account.
• The order of testing doesn't matter (e.g. Bonferroni and TukeyHSD).
• Stepwise tests:

• The rejection or non-rejection of a null hypothesis may depend on the decision of other hypotheses (e.g. Holm test, which is an extension of Bonferroni).
• Stepwise extension of single-step tests are often available and lead to more powerful methods. This comes at the cost of losing the ability to construct confidence intervals that corresponds to your tests.

## The set (family) of elements to be tested

• All pairwise comparisons in the ANOVA
• All pairwise comparisons with the control
• Multiple comparisons with the (unknown) best: compare treatment means with the unknown best or worst.
• Comparisons with the average mean (Analysis of Means or ANOM): Identify treatments that differ significantly from the overall average.
• Dose-response contrasts: To find the minimum effective dose.

In the software, we mainly focus on the first two types.

## Which Test should I use ?

- The general recommendation (not included in the software) is to use a correlation-based logically constrained method for stepwise comparisons (e.g. Shaffer-Royen method).

- LSD should only be used when we have 3 treatment groups (this is when the FWER is controlled in the strong sense.)