(Sep, 2006): Graduated with a BS in statistics from Lebanese University, Beirut

(Jan, 2008): Joined the statistics PhD program at UTDallas.

- (June, 2008): Got my first citation (didn't even publish any paper yet!) and learned how to respect stop signs.

(Aug, 2012) Got a PhD in statistics.

(Sep, 2012) Started working at TAMU

\(X_{ij}=j^{th}\) measurement on \(i^{th}\) subject, \({j=1,\ldots,k_{i}},\) \({i=1,\ldots,n}.\) \({N=}{\displaystyle {\sum_{i=1}^{n}k_{i}}}\) is the total number of measurements.

\(X_{ij}\sim F\), with \(F\) being absolutely continuous strictly increasing c.d.f, with probability density \(f\) and finite second moment.

\(F\) is estimated by \[ {F_{n}(x)={\displaystyle \sum_{i=1}^{n}w_{i}{\displaystyle \sum_{j=1}^{k_{i}}I\left(X_{ij}\leq x\right)}},\qquad{\displaystyle \sum_{i=1}^{n}}k_{i}w_{i}=1} \]

Many important statistics may be expressed as an \(L-\) statistic \[ \sum_{i=1}^{N}c_{s,N}X_{(s)}\label{eq:L-stat} \qquad (1)\]

A subclass of (1) is given by \[ T\left(F_{n}\right)=\int_{0}^{1}F_{n}^{-1}(x)m\left(x\right)dx+\sum_{l=1}^{r}a_{l}F_{n}^{-1}\left(p_{l}\right)=:T_{1}(F_{n})+T_{2}(F_{n}) \] where \(m(x)\) is a weight generating function, \(r\) is a pre-specified integer and \(a_{l},l=1,...,r\) are known constants, not all of which are equal to zero.

\(T\left(F_{n}\right)\) is an L-statistic \(\sum_{s=1}^{N}c_{s,N}X_{(s)}\) with \(c_{s,N}= \int_{q_{s-1,N}}^{q_{s,N}}m(x)dx+a_{l}\) for \(l\) such that \(q_{s-1,N}\le p_{l}\le q_{s,N}\) where \(q_{s,N}\) is the total weight of the \(s\) smallest observations.

__Main Result:__Under certain assumptions, we have: \[ \sqrt{n}\left[T\left(F_{n}\right)-T(F)\right]\overset{d}{\to}N(0,\sigma^{2}) \]

Two main duties:

- Pursue my own statistical research
- interested in Longitudinal and Functional data analysis, biostatistics and multivariate statistics/Machine Learning)

Typical data I deal with in repeated functional data analysis

- Work with biologists
- Consulting, author and co-author biology papers, refreeing biology papers etc.
- This is where I got inspired to build the software

- MS Word is the preferred software to write papers for many scientific fields
- ANOVA results with post-hoc multiple comparisons are usually summarized in tables
- Pairwise test results are represented using a letter-based algorithm where treatment means in a row without a common superscript are significantly different ($Pval<\alpha$).
- At the biology lab, I was asked to construct few tables like this every other week

- Generate publication-ready MS word tables with post-hoc results included therein.
- The letter-based algorithm also allows ranking of group means
- Can handle multiple data-sets in an Excel Workbook at once.
- Can handle multiple summary data at once. This is very useful for journal reviewers or whenever there is a need to analyze data with no access to the original observations.
- Automatic informative caption are also generated.
- Support two of the most widely used formats for tables in bio-medical/agricultural journals.
- Allows the user to separately control the number of significant digits for means and SEM.
- Perform a process that usually takes few hours (depending on the size of the table) in few seconds while eliminating errors introduced by human input.

+---/ShinyAppFolder | +---www | +---img1 | +---img2 | +---server.R # controls which parts of your app run/update when | +---ui.R # user-interface | +---Functions.R | +---About.R | +---Legal.R | +---Update.R |

+---/srv/shiny-server | +---shinyApp1 | +---server.R | +---ui.R | +---shinyApp2 | +---server.R | +---ui.R | +---assets | +---style.css | +---script.js

`Functions.R`

```
SumAOV = function(n, Mean, S, method, SEM = F, alpha = 0.05, Pooled = T,
p.adj = "none", mean.sig, sem.sig){
# 1. if statements to ensure valid inputs and produce informative error messages
# 2. compute all ANOVA related quatities e.g. SST, MSE, omnibus test pval etc.
# 3. generate mean ± SEM or mean and pooled SEM labels
# 4. Perform all pairwise comparisons based on the method selected
if (method == 'tukey'){
center <- outer(Mean, Mean, "-")
keep <- lower.tri(center)
center <- center[keep]
width <- qtukey(1-alpha, a, N-a) *sqrt((MSE/2) * outer(1/n, 1/n, "+"))[keep]
Res = abs(center) >= width
}
# 5. represent pairwise comparisons results using a letter-based algorithm as in Piepho (2004)
# 6. add the superscript letter coming from step 5 to labels from step 3 to generate a row in the table.
}
```

```
## for single full data
f1 = function(data, Factor, f1.method, f1.Pooled = T, ...) {
...
# Ensure clean and short variable names
Var.names = gsub("[^[:alnum:]]", " ", Var.names)
Var.names[which(nchar(Var.names) > 10)] = abbreviate(Var.names[which(nchar(Var.names) >
10)])
...
# Calculate sample sizes, means and sd's
n = as.vector(table(fct))
Mean = aggregate(data, by = list(fct), FUN = "mean", na.rm = T)
S = aggregate(data, by = list(fct), FUN = "sd", na.rm = T)
...
# Calling the main function SumAOV() on every column of our data
for (i in 1:N) Tablerows = rbind(Tablerows, SumAOV(n, Mean[, i + 1], S[,
i + 1], method = f1.method, Pooled = f1.Pooled, ...)$tableRow)
...
# Generate informative captions
}
```

```
## for single summary data
f2 = function(Table, n, sem = T, f2.method, f2.Pooled, ...) {
...
# Seperate mean from sem at ±
Mean_SEM = apply(Table[, -1], 2, strsplit, "\\'b1") # ± = \\'b1
...
# construct the mean and sem (character) matrix (could have used unlist)
for (i in 1:s) {
for (j in 1:r) {
Mean[j, i] = Mean_SEM[[i]][[j]][1]
S[j, i] = Mean_SEM[[i]][[j]][2]
}
}
...
#
for (i in 1:N) Tablerows = rbind(Tablerows, SumAOV(n, Mean[i, ], S[i, ],
method = f2.method, SEM = T, Pooled = f2.Pooled, ...)$tableRow)
...
# Generate informative captions
}
```

```
## for workbook (multiple sheets)
f3 = function(workbook, data = T, n.f3 = NULL, f3.method, sem.f3 = F, f3.Pooled,
...) {
if ((data == F) && (is.null(n.f3)))
stop("n must be specified")
N = length(workbook)
Res = list()
if (data == T) {
for (k in 1:N) Res[[k]] = f1(workbook[[k]], names(workbook[[k]])[1],
f1.method = f3.method, f1.Pooled = f3.Pooled, ...)
} else {
if (sem.f3 == T) {
for (k in 1:N) {
Res[[k]] = f2(workbook[[k]], n.f3, f2.method = f3.method, sem = T,
f2.Pooled = f3.Pooled, ...)
}
} else {
for (k in 1:N) {
Res[[k]] = f2(workbook[[k]], n.f3, f2.method = f3.method, sem = F,
f2.Pooled = f3.Pooled, ...)
}
}
}
return(Res)
}
```

Tables are by construction more complicated than one-way ANOVA tables

Some cells span multiple rows, others span multiple columns

Perform Post-hoc analysis with significant and non-significant interaction terms while keeping the design of the table standard (as required by most journals)

```
add.merged.table.row1 <- function(col.data = c("c1", "c2", "c3"), a, b, pooled.mrg,
col.widths = c(1, 4.5, 1), justify = "CENTER", font.size = 10, last.row = FALSE,
indent = 0, border.top = T, border.bottom = T) {
header <- paste("\\trowd\\trgaph100\\trleft", indent, sep = "") # trqc for centered
justify.q <- "\\ql"
if (justify == "LEFT")
justify.q <- "\\ql" # Justify cell content accordingly
...
bright = "\\clbrdrr\\brdrs\\brdrw15" # Draw cell left and right borders
bleft = "\\clbrdrl\\brdrs\\brdrw15"
...
if (pooled.mrg == T) {
# For pooled SEM table, one extra column
merged <- c("\\clvmgf", rep(c("\\clmgf", rep("\\clmrg", b - 1)), a),
"\\clvmgf", "\\clmgf", rep("\\clmrg", 2))
} else {
merged <- c("\\clvmgf", rep(c("\\clmgf", rep("\\clmrg", b - 1)), a),
"\\clmgf", rep("\\clmrg", 2))
}
cols.prefix <- paste("\\clvertalc \\clshdrawnil \\clwWidth", round(col.widths *
1440, 0), "\\clftsWidth3 \\clheight260 \\clpadl100 \\clpadr100 \\gaph",
btop, " ", bbottom, " ", bright, " ", bleft, merged, "\\cellx", c(1:length(col.widths)),
"\n", sep = "", collapse = "")
cols <- paste("\\pard", justify.q, "\\widctlpar\\intbl\\fi0\\f2\\fs", font.size *
2, " ", .convert(col.data), "\\cell\n", sep = "", collapse = "")
end.row <- "\\widctlpar\\intbl \\row \n\n"
paste(header, cols.prefix, cols, end.row, sep = "")
}
```

If interaction term is significant, post hoc-analysis is usually conducted on treatment means to test \(H_0: \mu_{ij}=\mu_{i^\prime j^\prime}\), the software can handle this situation -- No problem here

Often times (e.g. in nutritional studies), if interaction is not significant, it is still interesting and important to compare treatment means. Two possible approaches:

- Let the software handle it by performing all pairwise comparisons
- Need only compare those that are not covered by main effect analysis -- may achieve higher power (see next for details).

- For simplicity, assume we have 2 factors with 2 levels each.
- The only comparisons not covered by the main-effect analysis are \(\mu_{12}\) vs \(\mu_{21}\) (shown in blue) or \(\mu_{11}\) vs \(\mu_{22}\) (shown in black)

Case | $\mu_{11}$ | $\mu_{21}$ | $\mu_{12}$ | $\mu_{22}$ | Main effects |
---|---|---|---|---|---|

I | $a$ | $a$ | $a$ | $a$ | No main effect in A, no main effect in B; $\mu_{11} = \mu_{21}=\mu_{12}=\mu_{22}$ |

II | $a$ | $a$ | $b$ | $b$ | No main effect in A, main effect in B; $\mu_{11} \neq \mu_{22}$, $\mu_{21} \neq \mu_{12}$ |

III | $a$ | $b$ | $a$ | $b$ | Main effect in A, no main effect in B; $\mu_{11} \neq \mu_{22}$, $\mu_{21} \neq \mu_{12}$ |

IV | $a$ | $b$ | $b$ or $c$ |
$a$ or $d$ |
Main effect in A, main effect in B; we do not know whether $\mu_{11} = \mu_{22}$ or $\mu_{21} = \mu_{12}$ |

- No need to preform any detailed comparisons in cases I, II and III.
- In case IV, we need to only test $H_0:\mu_{11} = \mu_{22}$ or $H_0:\mu_{21} = \mu_{12}$, which can be done using a t-test
- Need to generalize the previous approach to $a \times b$ factorial designs

__One-way ANOVA Software:__

Assaad et al. (2014): Rapid publication-ready MS-Word tables for one-way ANOVA.*SpringerPlus*3:474__Work from Thesis__

HI Assaad & PK Choudhary (2013) L-statistics for repeated measurements data,*JNPS*, 25:2, 499-521