Overview

We wrote a function me.test() for a nonparametric test of equality of two distributions when the observed contaminated data follow the classical additive measurement error model. We want to test the hypothesis \(H_0: F_x=F_y\) when the observed data are \(D_w=\{{\mbox{$\mathbf W$}}_1, \dots, {\mbox{$\mathbf W$}}_{n_x}\}\) and \(D_v=\{{\mbox{$\mathbf V$}}_1, \dots, {\mbox{$\mathbf V$}}_{n_y}\}\), where \({\mbox{$\mathbf W$}}^T_j=(W_{j1}, \dots, W_{jm_x})\), \({\mbox{$\mathbf V$}}^T_k=(V_{k1}, \dots, V_{km_y})\) for \(j=1, \dots, n_x\) and \(k=1, \dots, n_y\), \(F_x\) and \(F_y\) are the cumulative distribution functions (CDFs) of \(X\) and \(Y\). We note that the observed \(W\)’s and \(V\)’s are the surrogates of the unobserved \(X\)’s and \(Y\)’s, and assume that \(m_x\geq 2\) and \(m_y\geq 2\), that is, we have multiple replicates for unobservable true value. By the classical additive measurement error model we mean
\[\begin{equation*} W_{jl}=X_j+U_{x, jl} \mbox{ and }V_{kl^{'}}=Y_k+U_{y, kl^{'}}, \end{equation*}\]

for \(j=1, \dots, n_x\), \(l=1, \dots, m_x\), \(k=1, \dots, n_y\), \(l'=1, \dots, m_y\). The measurement error \(U_{x, jl}\)’s are assumed to be iid, independent of \(X_j\), and follows the distribution \(F_{u_x}\) that is symmetric around \(0\). Similarly, we assume that the measurement error \(U_{y, kl}\) are iid, independent of \(Y_k\), and follows the distribution \(F_{u_y}\) that is symmetric around \(0\). Further, \(X\), \(Y\), \(U_x\) and \(U_y\) are assumed to be independent. It is important to note that \(\{X_1, \dots, X_{n_x}\}\) and \(\{Y_1, \dots, Y_{n_y}\}\) are never observed.

Example 1

We will simulate true values and measurement errors from the following design: \[ X \sim {\rm Normal}(0, 1), Y \sim {\rm Normal}(0.2, 1) \mbox{ and }U_x, U_y \sim DE(0, 0.35), \] where \(DE(a, b)\) stands for the double exponential distribution with mean \(a\) and variance \(2b^2\). Note that the standard deviation of the measurement errors \(U_x\) and \(U_y\) is half of the variance of true signals.

library(smoothmest) ## To generate dobule exponential distribution
## Loading required package: MASS
set.seed(1234)
n <- 200
mx <- my <- 2
X <- rnorm(n, mean = 0, sd = 1)
Y <- rnorm(n, mean = 0.2, sd = 1)
Ux <- matrix(rdoublex(n*mx, mu = 0, lambda = 0.35), ncol = mx)
Uy <- matrix(rdoublex(n*my, mu = 0, lambda = 0.35), ncol = my)
W <- X + Ux
V <- Y + Uy

In this example, \(n_x = n_y = 200\) and \(m_x = m_y = 2\).

par(mfrow = c(1,2))
boxplot(W, main = "W")
boxplot(V, main = "V")

Now we load packages in the current workspace

library(statmod)
library(MEtest) # This package conducts the hypothesis test.

The package statmod is necessary for the numerical integration. The function me.test() in MEtest has the following arguments:

Once we have W and V it is straighforwad to compute \(p\)-value for \(H_0\).

# Let's view the observed data
head(W)
##            [,1]       [,2]
## [1,] -0.3544622 -1.7772733
## [2,]  0.5700091 -0.2169628
## [3,]  0.7204120  1.6900299
## [4,] -2.2969413 -2.3486976
## [5,] -0.2219204  0.5253668
## [6,]  0.8506322  0.2758073
head(V)
##             [,1]        [,2]
## [1,]  0.49508086 -0.08912287
## [2,]  0.77135540  0.92884779
## [3,] -0.05501657  0.46879317
## [4,]  1.75113131  1.57563234
## [5,]  0.67080298  0.10767074
## [6,]  0.98045858  0.67460554
me.test(W, V, wt = "Uniform")
## 
##  Homogeneity test under Measurement Error with Uniform weight
## 
## data:  W and V
## T (Uniform) = 56.606, p-value = 0.006
## alternative hypothesis: True signals come from different distributions
me.test(W, V, wt = "Normal")
## 
##  Homogeneity test under Measurement Error with Normal weight
## 
## data:  W and V
## T (Normal) = 18.232, p-value < 2.2e-16
## alternative hypothesis: True signals come from different distributions
me.test(W, V, wt = "Uniform", wt.prob = 0.95)
## 
##  Homogeneity test under Measurement Error with Uniform weight
## 
## data:  W and V
## T (Uniform) = 46.34, p-value < 2.2e-16
## alternative hypothesis: True signals come from different distributions

The output of me.test() contains test statistic, \(p\)-value, and information on the weight function.

Example 2

In this example, we set \(n_x = 200\) and \(n_y = 300\) and \(m_x = 2\) and \(m_y = 3\). \[ X \sim {\rm Normal}(0, 1), Y \sim {\rm Normal}(0, 1) \mbox{ and }U_x, U_y \sim DE(0, 0.35), \]

set.seed(12345)
nx <- 200; ny <- 300
mx <- 2; my <- 3
X <- rnorm(nx, mean = 0, sd = 1)
Y <- rnorm(ny, mean = 0, sd = 1)
Ux <- matrix(rdoublex(nx*mx, mu = 0, lambda = 0.35), ncol = mx)
Uy <- matrix(rdoublex(ny*my, mu = 0, lambda = 0.35), ncol = my)
W <- X + Ux
V <- Y + Uy
## Observed data are W and V
head(W)
##             [,1]        [,2]
## [1,] -0.22771326  0.91312650
## [2,]  0.46886646 -0.09680206
## [3,]  1.67410510  0.06787347
## [4,] -0.70181978  0.80573355
## [5,] -0.01687766 -0.41074833
## [6,] -1.96293518 -1.65193027
head(V)
##             [,1]       [,2]       [,3]
## [1,] -0.94299160 -1.1055595 -1.8662883
## [2,] -0.11082930 -0.5856071 -0.7795955
## [3,] -0.09263611  0.8909308 -0.7878444
## [4,]  1.18335963  1.4697445  1.5616781
## [5,]  1.62380283  2.1007738  1.1184517
## [6,]  0.10053012  0.3129272  0.6286603
me.test(W, V, wt = "Uniform")
## 
##  Homogeneity test under Measurement Error with Uniform weight
## 
## data:  W and V
## T (Uniform) = 13.165, p-value = 0.199
## alternative hypothesis: True signals come from different distributions
me.test(W, V, wt = "Normal")
## 
##  Homogeneity test under Measurement Error with Normal weight
## 
## data:  W and V
## T (Normal) = 3.4635, p-value = 0.209
## alternative hypothesis: True signals come from different distributions
me.test(W, V, wt = "Uniform", wt.prob = 0.95)
## 
##  Homogeneity test under Measurement Error with Uniform weight
## 
## data:  W and V
## T (Uniform) = 9.6725, p-value = 0.141
## alternative hypothesis: True signals come from different distributions

Example 3

In this example, we set \(n_x = 200\) and \(n_y = 300\) and \(m_x = 2\) and \(m_y = 3\). \[ X \sim {\rm Normal}(0, 10^2), Y \sim {\rm Normal}(0, 10^2) \mbox{ and }U_x, U_y \sim DE(0, 3.5), \]

set.seed(12345)
nx <- 200; ny <- 300
mx <- 2; my <- 3
X <- rnorm(nx, mean = 0, sd = 10)
Y <- rnorm(ny, mean = 0, sd = 10)
Ux <- matrix(rdoublex(nx*mx, mu = 0, lambda = 3.5), ncol = mx)
Uy <- matrix(rdoublex(ny*my, mu = 0, lambda = 3.5), ncol = my)
W <- X + Ux
V <- Y + Uy
## Observed data are W and V
head(W)
##             [,1]        [,2]
## [1,]  -2.2771326   9.1312650
## [2,]   4.6886646  -0.9680206
## [3,]  16.7410510   0.6787347
## [4,]  -7.0181978   8.0573355
## [5,]  -0.1687766  -4.1074833
## [6,] -19.6293518 -16.5193027
head(V)
##            [,1]       [,2]       [,3]
## [1,] -9.4299160 -11.055595 -18.662883
## [2,] -1.1082930  -5.856071  -7.795955
## [3,] -0.9263611   8.909308  -7.878444
## [4,] 11.8335963  14.697445  15.616781
## [5,] 16.2380283  21.007738  11.184517
## [6,]  1.0053012   3.129272   6.286603
me.test(W, V, wt = "Uniform")
## 
##  Homogeneity test under Measurement Error with Uniform weight
## 
## data:  W and V
## T (Uniform) = 642770905, p-value = 0.713
## alternative hypothesis: True signals come from different distributions
me.test(W, V, wt = "Normal")
## 
##  Homogeneity test under Measurement Error with Normal weight
## 
## data:  W and V
## T (Normal) = 15933, p-value = 0.767
## alternative hypothesis: True signals come from different distributions
me.test(W, V, wt = "Uniform", wt.prob = 0.95)
## 
##  Homogeneity test under Measurement Error with Uniform weight
## 
## data:  W and V
## T (Uniform) = 2.27e+09, p-value = 0.517
## alternative hypothesis: True signals come from different distributions

Reference

Lee, D., Lahiri, S. and Sinha, S. A test of homogeneity of distributions when observations are subject to measurement errors. Under revision.