Rapid Publication-Ready MS Word Tables Using One/Two-Way ANOVA


By Houssein I. Assaad, Ph.D.

Research Assistant Professor

www.stat.tamu.edu/~hassaad

Tamu logo

Brief introduction (My story)

  • (Sep, 2006): Graduated with a BS in statistics from Lebanese University, Beirut

  • (Jan, 2008): Joined the statistics PhD program at UTDallas.

    • (June, 2008): Got my first citation (didn't even publish any paper yet!) and learned how to respect stop signs.
  • (Aug, 2012) Got a PhD in statistics.

  • (Sep, 2012) Started working at TAMU

My thesis

  • \(X_{ij}=j^{th}\) measurement on \(i^{th}\) subject, \({j=1,\ldots,k_{i}},\) \({i=1,\ldots,n}.\) \({N=}{\displaystyle {\sum_{i=1}^{n}k_{i}}}\) is the total number of measurements.

  • \(X_{ij}\sim F\), with \(F\) being absolutely continuous strictly increasing c.d.f, with probability density \(f\) and finite second moment.

  • \(F\) is estimated by \[ {F_{n}(x)={\displaystyle \sum_{i=1}^{n}w_{i}{\displaystyle \sum_{j=1}^{k_{i}}I\left(X_{ij}\leq x\right)}},\qquad{\displaystyle \sum_{i=1}^{n}}k_{i}w_{i}=1} \]

  • Many important statistics may be expressed as an \(L-\) statistic \[ \sum_{i=1}^{N}c_{s,N}X_{(s)}\label{eq:L-stat} \qquad (1)\]

  • A subclass of (1) is given by \[ T\left(F_{n}\right)=\int_{0}^{1}F_{n}^{-1}(x)m\left(x\right)dx+\sum_{l=1}^{r}a_{l}F_{n}^{-1}\left(p_{l}\right)=:T_{1}(F_{n})+T_{2}(F_{n}) \] where \(m(x)\) is a weight generating function, \(r\) is a pre-specified integer and \(a_{l},l=1,...,r\) are known constants, not all of which are equal to zero.

  • \(T\left(F_{n}\right)\) is an L-statistic \(\sum_{s=1}^{N}c_{s,N}X_{(s)}\) with \(c_{s,N}= \int_{q_{s-1,N}}^{q_{s,N}}m(x)dx+a_{l}\) for \(l\) such that \(q_{s-1,N}\le p_{l}\le q_{s,N}\) where \(q_{s,N}\) is the total weight of the \(s\) smallest observations.

  • Main Result: Under certain assumptions, we have: \[ \sqrt{n}\left[T\left(F_{n}\right)-T(F)\right]\overset{d}{\to}N(0,\sigma^{2}) \]

Current job at TAMU (TPBBNC program)

Two main duties:

  • Pursue my own statistical research
    • interested in Longitudinal and Functional data analysis, biostatistics and multivariate statistics/Machine Learning)

Typical data I deal with in repeated functional data analysis

  • Work with biologists
    • Consulting, author and co-author biology papers, refreeing biology papers etc.
    • This is where I got inspired to build the software

Idea behind the software

classical MCP summary
  • MS Word is the preferred software to write papers for many scientific fields
  • ANOVA results with post-hoc multiple comparisons are usually summarized in tables
  • Pairwise test results are represented using a letter-based algorithm where treatment means in a row without a common superscript are significantly different ($Pval<\alpha$).
  • At the biology lab, I was asked to construct few tables like this every other week

...sometimes in Pooled SEM format

classical MCP summary

After 3 weeks...(a bit exaggerated )

classical MCP summary

Software 1

Software 2

Summary of features of Software 1

  • Generate publication-ready MS word tables with post-hoc results included therein.
  • The letter-based algorithm also allows ranking of group means
  • Can handle multiple data-sets in an Excel Workbook at once.
  • Can handle multiple summary data at once. This is very useful for journal reviewers or whenever there is a need to analyze data with no access to the original observations.
  • Automatic informative caption are also generated.
  • Support two of the most widely used formats for tables in bio-medical/agricultural journals.
  • Allows the user to separately control the number of significant digits for means and SEM.
  • Perform a process that usually takes few hours (depending on the size of the table) in few seconds while eliminating errors introduced by human input.

Software directory tree

+---/ShinyAppFolder
|   +---www
|       +---img1
|       +---img2
|   +---server.R      # controls which parts of your app run/update when
|   +---ui.R          # user-interface
|   +---Functions.R
|   +---About.R
|   +---Legal.R
|   +---Update.R
|       

Structure of the shiny-server

+---/srv/shiny-server
|   +---shinyApp1
|       +---server.R
|       +---ui.R
|   +---shinyApp2
|       +---server.R
|       +---ui.R
|   +---assets
|       +---style.css
|       +---script.js

The file Functions.R

Workhorse function behind Software 1

SumAOV = function(n, Mean, S, method, SEM = F, alpha = 0.05, Pooled = T, 
                  p.adj = "none", mean.sig, sem.sig){

  # 1. if statements to ensure valid inputs and produce informative error messages

  # 2. compute all ANOVA related quatities e.g. SST, MSE, omnibus test pval etc.

  # 3. generate mean ± SEM or mean and pooled SEM labels

  # 4. Perform all pairwise comparisons based on the method selected
  if (method == 'tukey'){
      center <- outer(Mean, Mean, "-")
      keep <- lower.tri(center)
      center <- center[keep]
      width <- qtukey(1-alpha, a, N-a) *sqrt((MSE/2) * outer(1/n, 1/n, "+"))[keep]

      Res  = abs(center) >= width  
    }
  # 5. represent pairwise comparisons results using a letter-based algorithm as in Piepho (2004)

  # 6. add the superscript letter coming from step 5 to labels from step 3 to generate a row in the table. 
  } 

Auxiliary functions for different data scenarios

Scenario 1: Single data

## for single full data

f1 = function(data, Factor, f1.method, f1.Pooled = T, ...) {
    ...
    # Ensure clean and short variable names
    Var.names = gsub("[^[:alnum:]]", " ", Var.names)
    Var.names[which(nchar(Var.names) > 10)] = abbreviate(Var.names[which(nchar(Var.names) > 
        10)])
    ...
    # Calculate sample sizes, means and sd's
    n = as.vector(table(fct))
    Mean = aggregate(data, by = list(fct), FUN = "mean", na.rm = T)
    S = aggregate(data, by = list(fct), FUN = "sd", na.rm = T)
    ...
    # Calling the main function SumAOV() on every column of our data
    for (i in 1:N) Tablerows = rbind(Tablerows, SumAOV(n, Mean[, i + 1], S[, 
        i + 1], method = f1.method, Pooled = f1.Pooled, ...)$tableRow)
    ...
    # Generate informative captions
}

Scenario 2: Single summary data

## for single summary data

f2 = function(Table, n, sem = T, f2.method, f2.Pooled, ...) {

    ...
    # Seperate mean from sem at ±
    Mean_SEM = apply(Table[, -1], 2, strsplit, "\\'b1")  # ± = \\'b1
    ...
    # construct the mean and sem (character) matrix (could have used unlist)
    for (i in 1:s) {
        for (j in 1:r) {
            Mean[j, i] = Mean_SEM[[i]][[j]][1]
            S[j, i] = Mean_SEM[[i]][[j]][2]
        }
    }
    ...
    # 
    for (i in 1:N) Tablerows = rbind(Tablerows, SumAOV(n, Mean[i, ], S[i, ], 
        method = f2.method, SEM = T, Pooled = f2.Pooled, ...)$tableRow)
    ...
    # Generate informative captions
}

Scenario 3: Multiple data in Excel workbook

Scenario 4: Multiple summary data in Excel workbook

## for workbook (multiple sheets)

f3 = function(workbook, data = T, n.f3 = NULL, f3.method, sem.f3 = F, f3.Pooled, 
    ...) {
    if ((data == F) && (is.null(n.f3))) 
        stop("n must be specified")
    N = length(workbook)
    Res = list()
    if (data == T) {

        for (k in 1:N) Res[[k]] = f1(workbook[[k]], names(workbook[[k]])[1], 
            f1.method = f3.method, f1.Pooled = f3.Pooled, ...)

    } else {
        if (sem.f3 == T) {
            for (k in 1:N) {
                Res[[k]] = f2(workbook[[k]], n.f3, f2.method = f3.method, sem = T, 
                  f2.Pooled = f3.Pooled, ...)
            }
        } else {
            for (k in 1:N) {
                Res[[k]] = f2(workbook[[k]], n.f3, f2.method = f3.method, sem = F, 
                  f2.Pooled = f3.Pooled, ...)
            }
        }
    }
    return(Res)
}

Two-way analysis software

Goal: Create tables like this (Per group SEM)

classical MCP summary

...and this (pooled SEM)

classical MCP summary

Main challenges

  • Tables are by construction more complicated than one-way ANOVA tables

  • Some cells span multiple rows, others span multiple columns

  • Perform Post-hoc analysis with significant and non-significant interaction terms while keeping the design of the table standard (as required by most journals)

Two-way ANOVA software

Table header function (multiple row and column spanning)

add.merged.table.row1 <- function(col.data = c("c1", "c2", "c3"), a, b, pooled.mrg, 
    col.widths = c(1, 4.5, 1), justify = "CENTER", font.size = 10, last.row = FALSE, 
    indent = 0, border.top = T, border.bottom = T) {

    header <- paste("\\trowd\\trgaph100\\trleft", indent, sep = "")  # trqc for centered

    justify.q <- "\\ql"
    if (justify == "LEFT") 
        justify.q <- "\\ql"  # Justify cell content accordingly 
    ...

    bright = "\\clbrdrr\\brdrs\\brdrw15"  # Draw cell left and right borders 
    bleft = "\\clbrdrl\\brdrs\\brdrw15"

    ...
    if (pooled.mrg == T) {
        # For pooled SEM table, one extra column
        merged <- c("\\clvmgf", rep(c("\\clmgf", rep("\\clmrg", b - 1)), a), 
            "\\clvmgf", "\\clmgf", rep("\\clmrg", 2))
    } else {
        merged <- c("\\clvmgf", rep(c("\\clmgf", rep("\\clmrg", b - 1)), a), 
            "\\clmgf", rep("\\clmrg", 2))
    }

    cols.prefix <- paste("\\clvertalc \\clshdrawnil \\clwWidth", round(col.widths * 
        1440, 0), "\\clftsWidth3 \\clheight260 \\clpadl100 \\clpadr100 \\gaph", 
        btop, " ", bbottom, " ", bright, " ", bleft, merged, "\\cellx", c(1:length(col.widths)), 
        "\n", sep = "", collapse = "")
    cols <- paste("\\pard", justify.q, "\\widctlpar\\intbl\\fi0\\f2\\fs", font.size * 
        2, " ", .convert(col.data), "\\cell\n", sep = "", collapse = "")
    end.row <- "\\widctlpar\\intbl \\row \n\n"
    paste(header, cols.prefix, cols, end.row, sep = "")
}

What's next ?

  • If interaction term is significant, post hoc-analysis is usually conducted on treatment means to test \(H_0: \mu_{ij}=\mu_{i^\prime j^\prime}\), the software can handle this situation -- No problem here

  • Often times (e.g. in nutritional studies), if interaction is not significant, it is still interesting and important to compare treatment means. Two possible approaches:

    • Let the software handle it by performing all pairwise comparisons
    • Need only compare those that are not covered by main effect analysis -- may achieve higher power (see next for details).
  • For simplicity, assume we have 2 factors with 2 levels each.
  • The only comparisons not covered by the main-effect analysis are \(\mu_{12}\) vs \(\mu_{21}\) (shown in blue) or \(\mu_{11}\) vs \(\mu_{22}\) (shown in black)

Case $\mu_{11}$ $\mu_{21}$ $\mu_{12}$ $\mu_{22}$ Main
effects
I $a$ $a$ $a$ $a$ No main effect in A, no main effect in B; $\mu_{11} = \mu_{21}=\mu_{12}=\mu_{22}$
II $a$ $a$ $b$ $b$ No main effect in A, main effect in B; $\mu_{11} \neq \mu_{22}$, $\mu_{21} \neq \mu_{12}$
III $a$ $b$ $a$ $b$ Main effect in A, no main effect in B; $\mu_{11} \neq \mu_{22}$, $\mu_{21} \neq \mu_{12}$
IV $a$ $b$ $b$ or $c$
$a$ or $d$
Main effect in A, main effect in B; we do not know whether $\mu_{11} = \mu_{22}$ or $\mu_{21} = \mu_{12}$
  • No need to preform any detailed comparisons in cases I, II and III.
  • In case IV, we need to only test $H_0:\mu_{11} = \mu_{22}$ or $H_0:\mu_{21} = \mu_{12}$, which can be done using a t-test
  • Need to generalize the previous approach to $a \times b$ factorial designs

References

Thank you