Rapid Publication-Ready MS Word Tables Using One/Two-Way ANOVA

Brief introduction (My story)

• (Sep, 2006): Graduated with a BS in statistics from Lebanese University, Beirut

• (Jan, 2008): Joined the statistics PhD program at UTDallas.

• (June, 2008): Got my first citation (didn't even publish any paper yet!) and learned how to respect stop signs.
• (Aug, 2012) Got a PhD in statistics.

• (Sep, 2012) Started working at TAMU

My thesis

• $X_{ij}=j^{th}$ measurement on $i^{th}$ subject, ${j=1,\ldots,k_{i}},$ ${i=1,\ldots,n}.$ ${N=}{\displaystyle {\sum_{i=1}^{n}k_{i}}}$ is the total number of measurements.

• $X_{ij}\sim F$, with $F$ being absolutely continuous strictly increasing c.d.f, with probability density $f$ and finite second moment.

• $F$ is estimated by ${F_{n}(x)={\displaystyle \sum_{i=1}^{n}w_{i}{\displaystyle \sum_{j=1}^{k_{i}}I\left(X_{ij}\leq x\right)}},\qquad{\displaystyle \sum_{i=1}^{n}}k_{i}w_{i}=1}$

• Many important statistics may be expressed as an $L-$ statistic $\sum_{i=1}^{N}c_{s,N}X_{(s)}\label{eq:L-stat} \qquad (1)$

• A subclass of (1) is given by $T\left(F_{n}\right)=\int_{0}^{1}F_{n}^{-1}(x)m\left(x\right)dx+\sum_{l=1}^{r}a_{l}F_{n}^{-1}\left(p_{l}\right)=:T_{1}(F_{n})+T_{2}(F_{n})$ where $m(x)$ is a weight generating function, $r$ is a pre-specified integer and $a_{l},l=1,...,r$ are known constants, not all of which are equal to zero.

• $T\left(F_{n}\right)$ is an L-statistic $\sum_{s=1}^{N}c_{s,N}X_{(s)}$ with $c_{s,N}= \int_{q_{s-1,N}}^{q_{s,N}}m(x)dx+a_{l}$ for $l$ such that $q_{s-1,N}\le p_{l}\le q_{s,N}$ where $q_{s,N}$ is the total weight of the $s$ smallest observations.

• Main Result: Under certain assumptions, we have: $\sqrt{n}\left[T\left(F_{n}\right)-T(F)\right]\overset{d}{\to}N(0,\sigma^{2})$

Current job at TAMU (TPBBNC program)

Two main duties:

• Pursue my own statistical research
• interested in Longitudinal and Functional data analysis, biostatistics and multivariate statistics/Machine Learning)

Typical data I deal with in repeated functional data analysis

• Work with biologists
• Consulting, author and co-author biology papers, refreeing biology papers etc.
• This is where I got inspired to build the software

Idea behind the software

• MS Word is the preferred software to write papers for many scientific fields
• ANOVA results with post-hoc multiple comparisons are usually summarized in tables
• Pairwise test results are represented using a letter-based algorithm where treatment means in a row without a common superscript are significantly different ($Pval<\alpha$).
• At the biology lab, I was asked to construct few tables like this every other week

Summary of features of Software 1

• Generate publication-ready MS word tables with post-hoc results included therein.
• The letter-based algorithm also allows ranking of group means
• Can handle multiple data-sets in an Excel Workbook at once.
• Can handle multiple summary data at once. This is very useful for journal reviewers or whenever there is a need to analyze data with no access to the original observations.
• Automatic informative caption are also generated.
• Support two of the most widely used formats for tables in bio-medical/agricultural journals.
• Allows the user to separately control the number of significant digits for means and SEM.
• Perform a process that usually takes few hours (depending on the size of the table) in few seconds while eliminating errors introduced by human input.

Software directory tree

+---/ShinyAppFolder
|   +---www
|       +---img1
|       +---img2
|   +---server.R      # controls which parts of your app run/update when
|   +---ui.R          # user-interface
|   +---Functions.R
|   +---Legal.R
|   +---Update.R
|


Structure of the shiny-server

+---/srv/shiny-server
|   +---shinyApp1
|       +---server.R
|       +---ui.R
|   +---shinyApp2
|       +---server.R
|       +---ui.R
|   +---assets
|       +---style.css
|       +---script.js


Workhorse function behind Software 1

SumAOV = function(n, Mean, S, method, SEM = F, alpha = 0.05, Pooled = T,

# 1. if statements to ensure valid inputs and produce informative error messages

# 2. compute all ANOVA related quatities e.g. SST, MSE, omnibus test pval etc.

# 3. generate mean ± SEM or mean and pooled SEM labels

# 4. Perform all pairwise comparisons based on the method selected
if (method == 'tukey'){
center <- outer(Mean, Mean, "-")
keep <- lower.tri(center)
center <- center[keep]
width <- qtukey(1-alpha, a, N-a) *sqrt((MSE/2) * outer(1/n, 1/n, "+"))[keep]

Res  = abs(center) >= width
}
# 5. represent pairwise comparisons results using a letter-based algorithm as in Piepho (2004)

# 6. add the superscript letter coming from step 5 to labels from step 3 to generate a row in the table.
}


Scenario 1: Single data

## for single full data

f1 = function(data, Factor, f1.method, f1.Pooled = T, ...) {
...
# Ensure clean and short variable names
Var.names = gsub("[^[:alnum:]]", " ", Var.names)
Var.names[which(nchar(Var.names) > 10)] = abbreviate(Var.names[which(nchar(Var.names) >
10)])
...
# Calculate sample sizes, means and sd's
n = as.vector(table(fct))
Mean = aggregate(data, by = list(fct), FUN = "mean", na.rm = T)
S = aggregate(data, by = list(fct), FUN = "sd", na.rm = T)
...
# Calling the main function SumAOV() on every column of our data
for (i in 1:N) Tablerows = rbind(Tablerows, SumAOV(n, Mean[, i + 1], S[,
i + 1], method = f1.method, Pooled = f1.Pooled, ...)$tableRow) ... # Generate informative captions }  Scenario 2: Single summary data ## for single summary data f2 = function(Table, n, sem = T, f2.method, f2.Pooled, ...) { ... # Seperate mean from sem at ± Mean_SEM = apply(Table[, -1], 2, strsplit, "\\'b1") # ± = \\'b1 ... # construct the mean and sem (character) matrix (could have used unlist) for (i in 1:s) { for (j in 1:r) { Mean[j, i] = Mean_SEM[[i]][[j]][1] S[j, i] = Mean_SEM[[i]][[j]][2] } } ... # for (i in 1:N) Tablerows = rbind(Tablerows, SumAOV(n, Mean[i, ], S[i, ], method = f2.method, SEM = T, Pooled = f2.Pooled, ...)$tableRow)
...
# Generate informative captions
}


Scenario 4: Multiple summary data in Excel workbook

## for workbook (multiple sheets)

f3 = function(workbook, data = T, n.f3 = NULL, f3.method, sem.f3 = F, f3.Pooled,
...) {
if ((data == F) && (is.null(n.f3)))
stop("n must be specified")
N = length(workbook)
Res = list()
if (data == T) {

for (k in 1:N) Res[[k]] = f1(workbook[[k]], names(workbook[[k]])[1],
f1.method = f3.method, f1.Pooled = f3.Pooled, ...)

} else {
if (sem.f3 == T) {
for (k in 1:N) {
Res[[k]] = f2(workbook[[k]], n.f3, f2.method = f3.method, sem = T,
f2.Pooled = f3.Pooled, ...)
}
} else {
for (k in 1:N) {
Res[[k]] = f2(workbook[[k]], n.f3, f2.method = f3.method, sem = F,
f2.Pooled = f3.Pooled, ...)
}
}
}
return(Res)
}


Two-way analysis software

Main challenges

• Tables are by construction more complicated than one-way ANOVA tables

• Some cells span multiple rows, others span multiple columns

• Perform Post-hoc analysis with significant and non-significant interaction terms while keeping the design of the table standard (as required by most journals)

Table header function (multiple row and column spanning)

add.merged.table.row1 <- function(col.data = c("c1", "c2", "c3"), a, b, pooled.mrg,
col.widths = c(1, 4.5, 1), justify = "CENTER", font.size = 10, last.row = FALSE,
indent = 0, border.top = T, border.bottom = T) {

header <- paste("\\trowd\\trgaph100\\trleft", indent, sep = "")  # trqc for centered

justify.q <- "\\ql"
if (justify == "LEFT")
justify.q <- "\\ql"  # Justify cell content accordingly
...

bright = "\\clbrdrr\\brdrs\\brdrw15"  # Draw cell left and right borders
bleft = "\\clbrdrl\\brdrs\\brdrw15"

...
if (pooled.mrg == T) {
# For pooled SEM table, one extra column
merged <- c("\\clvmgf", rep(c("\\clmgf", rep("\\clmrg", b - 1)), a),
"\\clvmgf", "\\clmgf", rep("\\clmrg", 2))
} else {
merged <- c("\\clvmgf", rep(c("\\clmgf", rep("\\clmrg", b - 1)), a),
"\\clmgf", rep("\\clmrg", 2))
}

cols.prefix <- paste("\\clvertalc \\clshdrawnil \\clwWidth", round(col.widths *
btop, " ", bbottom, " ", bright, " ", bleft, merged, "\\cellx", c(1:length(col.widths)),
"\n", sep = "", collapse = "")
cols <- paste("\\pard", justify.q, "\\widctlpar\\intbl\\fi0\\f2\\fs", font.size *
2, " ", .convert(col.data), "\\cell\n", sep = "", collapse = "")
end.row <- "\\widctlpar\\intbl \\row \n\n"
paste(header, cols.prefix, cols, end.row, sep = "")
}


What's next ?

• If interaction term is significant, post hoc-analysis is usually conducted on treatment means to test $H_0: \mu_{ij}=\mu_{i^\prime j^\prime}$, the software can handle this situation -- No problem here

• Often times (e.g. in nutritional studies), if interaction is not significant, it is still interesting and important to compare treatment means. Two possible approaches:

• Let the software handle it by performing all pairwise comparisons
• Need only compare those that are not covered by main effect analysis -- may achieve higher power (see next for details).
• For simplicity, assume we have 2 factors with 2 levels each.
• The only comparisons not covered by the main-effect analysis are $\mu_{12}$ vs $\mu_{21}$ (shown in blue) or $\mu_{11}$ vs $\mu_{22}$ (shown in black)

Case $\mu_{11}$ $\mu_{21}$ $\mu_{12}$ $\mu_{22}$ Main
effects
I $a$ $a$ $a$ $a$ No main effect in A, no main effect in B; $\mu_{11} = \mu_{21}=\mu_{12}=\mu_{22}$
II $a$ $a$ $b$ $b$ No main effect in A, main effect in B; $\mu_{11} \neq \mu_{22}$, $\mu_{21} \neq \mu_{12}$
III $a$ $b$ $a$ $b$ Main effect in A, no main effect in B; $\mu_{11} \neq \mu_{22}$, $\mu_{21} \neq \mu_{12}$
IV $a$ $b$ $b$ or $c$
$a$ or $d$
Main effect in A, main effect in B; we do not know whether $\mu_{11} = \mu_{22}$ or $\mu_{21} = \mu_{12}$
• No need to preform any detailed comparisons in cases I, II and III.
• In case IV, we need to only test $H_0:\mu_{11} = \mu_{22}$ or $H_0:\mu_{21} = \mu_{12}$, which can be done using a t-test
• Need to generalize the previous approach to $a \times b$ factorial designs