# Statistical Tools For Biologists

## The Power of R's Dynamic Reporting and visualization

Research Assistant Professor
Texas A&M University

## Five things Biologists should know about Statistics

Ewan Birney (Head of Nucleotide Data at the European Bioinformatics Institute) talks about the importance of Statistics to biologists.
He cites the "Five statistical things I wished I had been taught 20 years ago"

1. Non-parametric statistics
2. R
3. The problem of multiple testing
4. The relationship between P-value, effect size, and sample size, and
5. Linear models and PCA

## Why Learn R ?

• R offers a vast array of analytical methods.
• R offers new methods sooner (years before SAS and SPSS etc.)
• R is rapidly becoming a universal language for data analysis.
• R's procedures, called functions, are open for you to see and modify.
• R's Graphics are extremely flexible and are of publication quality.
• R is free.

... and Reproducible Research (RR)

## Why Research should be reproducible ?

• Reproducibility is crucial for evaluating scientific claims.
• Using (RR) tools can make your research more effective and ultimately easier.
• Achieve higher research impact: (RR) is cited more. Researchers may use your data/code to look at unanticipated questions and should cite your work.

Tools for (RR):

• R: A programming language primarily for statistics and graphics.
• R package knitr: allows to combine your statistical analysis and results in one document
• Markup languages: instructions on how to format a presentation document, Markdown (you can learn it in 10 min).
• Rstudio: software that integrates R, knitr and Markup languages.
• Cloud Storage: e.g. Dropbox or Github, to store data, code, previous versions of these files and make this information widely accessible.

## Part I

Some Cool Animated Examples

## rCharts example

n1 = nPlot(Freq ~ Hair, group = "Eye", type = "multiBarChart", data = hair_eye_male)

## The Fruit Data

Fruit Year Location Sales Expenses Profit Date
1 Apples 2008.00 West 98.00 78.00 20.00 2008-12-31
2 Apples 2009.00 West 111.00 79.00 32.00 2009-12-31
3 Apples 2010.00 West 89.00 76.00 13.00 2010-12-31
4 Oranges 2008.00 East 96.00 81.00 15.00 2008-12-31
5 Bananas 2008.00 East 85.00 76.00 9.00 2008-12-31
6 Oranges 2009.00 East 93.00 80.00 13.00 2009-12-31
7 Bananas 2009.00 East 94.00 78.00 16.00 2009-12-31
8 Oranges 2010.00 East 98.00 91.00 7.00 2010-12-31
9 Bananas 2010.00 East 81.00 71.00 10.00 2010-12-31

## Motion Chart for the Fruit data

M1 = gvisMotionChart(Fruits, idvar = "Fruit", timevar = "Year", options = list(width = 500,
height = 400))
print(M1, tag = "chart")


## Part II

My Two Software (built using R)

• One-Way ANOVA From Summary Data with Post-Hoc Analysis
• Rapid Publication-Ready MS Word Tables for Biologists Using One-Way ANOVA

## My software 2

1. R Language Resources: A great place to start learning R is here.
2. rCharts: Ramnath Vaidyanathan. Create, customize and publish interactive JavaScript visualizations..
3. googleVis: M. Gesmann, D. de Castillo and Google Visualization API-Tutorial and demo
4. knitr: Yihui Xie. Elegant, flexible and fast dynamic report generation with R
5. shiny: Rstudio and Inc. Web Application framework for R.
6. Markdown. A great place to learn it is here
7. slidify: Ramnath Vaidyanathan. Create elegant, interactive presentations from R with Slidify.

## To do list

• I will show you how to construct slides with interactive content and publish them on the internet (using the R package slidify)

• I will introduce you to the R package ggplot2 to produce elegant publication quality graphs.

• A good simple example can be found here.

## Part III

A Quick Introduction to R

1. Basic R syntax
2. Installing R Packages
3. The R functions help and example
4. Input (reading data) and output (exporting results) (if time allows)

## R functions covered

• Help functions
• help(''), ?'', example()
• Math functions
• sqrt(), 1ength(), dim(), rep(), sum(), seq() and :
• Stat functions
• mean(), median(), sd(), cor.test(), lm()
• Graphing functions
• plot(), persp(), barplot(), abline(), qplot(){ggplot2}
• Data functions
• c(), matrix(), factor(), data.frame(), list(), merge()
• Input functions
• read.csv(), list.files(),read.table()

## R functions covered (continued)

• R packages functions
• install.packages(), install.github(){devtools}, require()

More useful resources are available here: