OPTIMIZATION
Click the Run button below when the applet has completed loading.
A project explaining the optimization techniques
is also available.
This applet illustrates the calculation of maximum likelihood estimates
for the parameters of a Gamma(log(a),log(b)) distribution. The applet
generates a set (100 observations) of random numbers from a Gamma
distribution. It then draws a contour plot of the log-likelihood
surface. In addition, profile likelihoods are drawn for each of the
two parameters. The two parameters a and b are reparameterized in
terms of the log to ensure that the optimization obeys the nonnegativity
constraints.
The profile likelihood for log(a) is calculated constraining the
log(b) parameter at the generating population parameter value value
while varying
mu. Likewise, the profile likelihood for log(b) is calculated
constraining the log(a) parameter at the generating population parameter
value while varying the log(b) parameter.
Click on the log-likelihood surface to indicate a starting position
for the estimation. Alternatively, click on the Restart button to
set the starting value to the method of moments (MOM) estimates.
Once optimization is started, the Restart button can be used to start
again at the initial values (however they were chosen).
There are several techniques that can
be used to update the estimated parameter vector. Some techniques
will almost reach a final solution in a fewer steps than
others. Estimation can be restarted with different starting values at any
time by clicking on the contour plot. New samples can also be generated.
When using Newton-Raphson or Fisher scoring, you can also try to
optimize the stepsize. With these techniques, optimizing the stepsize
overcomes bad steps where the algorithm attempts to step too far or not
far enough toward the solution. Regardless, the methods also employ
(automatically) Marquardt's modification such that the diagonals of the
negative Hessian are increased if the matrix is singular.
Optimizing stepsize has no effect
on the other techniques. Rather, the other techniques always
optimize the stepsize.
In theory, the PR and FR techniques are different. For this particular
problem you will likely see no difference. Also, the DFP and BFGS are
different, but you will see no difference in their output either. For
optimizing the log-likelihood, these pairs of techniques build up
curvature information that is slightly different, but then use a line
optimizer to find the next step. This usually means that the slight
difference in the accumulation of curvature information is dominated
by the line minimizer.
Convergence is declared if the absolute value of the sum of the
gradient is less than 1e-4 or if the maximum relative change in the
estimated parameter vector is less than 1e-4.
Several statistics are shown for the optimization process. The main
difference in the techniques is the calculation of the negative H
matrix. Note that this matrix is included in the output. If
optimzing the stepsize, the multiplicative factor of the stepsize
is included in the output.
The log(a) parameter is labeled beta_0 and the log(b) parameter is
labeled beta_1 in the output.