procunivariate package:procunivariate R Documentation A procedure to assist in carrying out univariate data analysis. Includes functions that can be used independantly of the overall procedure. Description: Takes a vector and returns 5 different graphs and also outputs 3 different types of numerical summaries Usage: procunivariate(x,name="data",type=1) Outline: 5 graphs consists of RawQuantile plot with super imposed box plot QiQ plot against a uniform Quantile based Histogram Quantile plot of mean assuming Normality Quantile plot of standard deviation assuming Normality 3 numerical outputs Graph functions: rqplot(x) Raw Quantile Plot consist of the sample quantiles graphed against the uniform quantiles. A box plot (without the whiskers) is super-imposed for ease in examining a 5 number summary. qiqp(x) QiQ plot is a graph of the quantiles of the sample data standardized according to the mid-distribution and 2 times the interquartile range against the uniform quantiles. When using the mid-distribution function to define the quantile function as described by Parzen (2004), the qiq-values of any 1st and 3rd quartile, regardless of the distribution will be -.25 and .25 respectively. Because of this unique property the qiq-plots are ideally suited both for identifying possible bi-modality and for examining the type of skewness or kurtosis that is exhibited by a sample. Any points that lie above a qiq of .5 indicates the presence of a right tail, and likewise any value below -.5 indicates the presence of a left tail. A good way to compare two distributions that a sample might come from is to plot the theoretical samples onto the same graph and by visual inspection determine which distribution a sample may have more likely come from. In demonstration of this method this particular qiq plot superimposes both the standard normal and exponential(1) onto the plot. One may notice that rarely will a sample from an exponential distribution ever have any data points below a qiq of -.5, and such a sample will almost always have at least a few data points above a qiq of .5. qhist(x) A histogram such that the bin cut points are determined by starting at the midquartile point and expanding in both directions in increments of .5 times the IQR. Sample points that happen to land on the cut points are distributed to the bins by the mid-distribution function (defined by Parzen 2004). The usefulness of this method lies in the fact that all histograms done by such method are universally equivalent. qmean(x) The quantile function of the sample mean assuming the parent distribution is approximately Normal. The 95% Confidence Interval for the mean is supperimposed for a graphical interpretation of the interval. qsig(x) The quantile function of the sample standard deviation assuming the parent distribution is approximately Normal. The 95% Confidence Interval for the standard deviation is supperimposed for a graphical interpretation of the interval. Numerical Summaries: EQA Summary: The 5-number summary along with the minimum and maximum qiq-values, the qiq-value of the median, the mid-quartile value and the twice the interquartile range (the scale parameter for qiq-values). As a matter of notation the qiq of the median (or any other specified qiq-value) is listed as QI(.5). Normal comparison chart: 7 prespecified quantiles that are useful for comparison of sample quantiles to standard normal quantiles Exponential comparison chart: 5 prespecifed quantile that are useful for comparison of sample quantiles to standard exponential quantiles Arguments: x: the data name: explicitly write the name of the data on the output plots type: Takes either 1 or 2. Type 2 includes an extra plot with 95% confidence intervals for the quantiles on the raw quantile plot, and all numerical summaries are exclued. References: Proceedings of the 2004 Winter Simulation Conference R. G. Ingalls, M. D. Rossetti, J. S. Smith, and B. A. Peters, eds. INPUT MODELING USING QUANTILE STATISTICAL METHODS Abhishek Gupta Department of Industrial Engineering Texas A & M University College Station, TX 77843, U.S.A. Emanuel Parzen Department of Statistics Texas A & M University College Station, TX 77843, U.S.A. http://projecteuclid.org/Dienst/UI/1.0/Summarize/euclid.ss/1113832730 Source: Statist. Sci. 19, no. 4 (2004), 652–662 Example: #(Make sure procunivariate is loaded first) #analyzing random samples from a N(1,0) distribution x<-rnorm(40) procunivariate(x,name="40 random normals") #analyzing random samples from an exponential(1) distribution x<-rexp(40) procunivariate(x,name="40 random exponentials")