program define disc version 4.0 preserve /* Save all the current stata stuff */ clear /* Clear everything so no conflicts */ quietly{ set obs 101 gen x1=. gen x11=. gen x12=. gen x2=. gen x21=. gen x22=. gen y1=. gen y2=. gen zero=.} global D_sm11 " Consider a population consisting of N elements, each of which can be one of two types. If we assign value 1 to elements of the first type and value 0 to elements of the other type, we call the population a 0-1 population. We denote by pi the proportion of elements that have value 1." wdctl static D_sm11 5 5 290 32 global D_sm12 " There are two natural ways to get a random sample of n of the elements in the population: 1) Sampling with replacement, where after selecting one element for the sample, you put it back in the population before selecting the next element, and 2) sampling without replacement, where after selecting an item you do not put it back before getting the next." wdctl static D_sm12 5 40 290 65 global D_sm13 " We denote by X the number of 1's in the sample of n elements. You select N, pi, and n, and the lab graphs how likely each possible value of X is for both sampling schemes." wdctl static D_sm13 5 85 290 35 global D_sm14 "Notes: 1) X is called a binomial (hypergeometric) random variable if the sampling is done with (without) replacement, 2) The larger N is, the closer the probabilities should be for the two sampling schemes since putting an element back means little for a large population." wdctl static D_sm14 75 110 200 40 global D_sm6 "N" global D_sm7 "50 100 200 300 400 500 1000" wdctl static D_sm6 5 105 10 10 global D_var1 "50" wdctl ssimple D_var1 D_sm7 5 115 25 70 global D_sm8 "pi" global D_sm9 ".02 .10 .20 .30 .40 .50 .60 .70 .80 .90 .98" wdctl static D_sm8 40 105 20 10 global D_var2 ".50" wdctl ssimple D_var2 D_sm9 40 115 25 65 global D_sm3 "n" global HJN_V3=10 wdctl static D_sm3 75 162 8 10 wdctl edit HJN_V3 85 162 20 10 wdctl button "Run" 115 160 30 14 D_b1 wdctl button "Close" 150 160 30 14 D_b2 wdctl button "Help" 185 160 30 14 D_b3 help global D_b1 "discdr" global D_b2 "exit 1234" global D_b3 "whelp disc" discdr cap noi wdlg "Sampling With and Without Replacement" $D_dlgx $D_dlgy 300 240 restore end program define discdr version 4.0 gph open gph pen 1 local N = $D_var1 local D = int($D_var2*`N') local n = $HJN_V3 local p = `D'/`N' local nobs = `n'+1 if `n' < 4 | `n' >= 51 { sstopbox stop "n must be between 4 and 50" exit} replace x1=. replace x2=. replace y1=. replace y2=. * Get binomial probabilities: replace x1 = _n-1 in 1/`nobs' replace y1 = exp(lngamma(`n'+1)-lngamma(x1+1)-lngamma(`n'-x1+1)+x1*log(`p') + (`n'-x1)*log(1-`p')) summarize y1 local y1max = _result(6) * Get hypergeometric probabilities (note how x values are obtained): replace x2 = x1 if x1 >= max(0,`n'-`N'+`D') & x1 <= min(`n',`D') replace y2 = exp(lngamma(`D'+1)-lngamma(x2+1)-lngamma(`D'-x2+1)+lngamma(`N'-`D'+1)-lngamma(`n'-x2+1)-lngamma(`N'-`D'-`n'+x2+1)+lngamma(`N'-`n'+1)+lngamma(`n'+1)-lngamma(`N'+1)) summarize y2 local y2max = _result(6) local ymax = max(`y1max',`y2max') * Set up axes: local np1=`n'+1 graph y1 x1, s(i) xlab(-1,`np1') ylab yscale(0,`ymax') l1(" ") l2(" ") b1(" ") b2(" ") /* */ bbox(3500,0,23000,32000,850,400,0) gphconv * Draw x axis tic mark labels: local yy1=-`ymax'/10. local yy2=1.5*`yy1' gph pen 2 local i=0 while `i' <= `n' { if 2*int(`i'/2)==`i' {drtext `i' `yy1' 0 `i'} else {drtext `i' `yy2' 0 `i'} local i = `i'+1} * Draw wide bars: replace x11=x1-.5 replace x12=x1+.5 replace zero=0 gph pen 1 segments x11 zero x11 y1 segments x12 zero x12 y1 segments x11 y1 x12 y1 segments x11 zero x12 zero * Draw narrow bars: replace x21=x2-.25 replace x22=x2+.25 replace zero=0 gph pen 3 segments x21 zero x21 y2 segments x22 zero x22 y2 segments x21 y2 x22 y2 segments x21 zero x22 zero * Draw title: local pi: display %3.2f `p' gph text 1000 16000 0 0 Sampling with replacement (wide bars) and without replacement (thin bars) gph text 2000 16000 0 0 (N=`N', n=`n', pi=D/N=`pi') end exit