OPTIMIZATION


Click the Run button below when the applet has completed loading.

A project explaining the optimization techniques is also available.



This applet illustrates the calculation of maximum likelihood estimates for the parameters of a Gamma(log(a),log(b)) distribution. The applet generates a set (100 observations) of random numbers from a Gamma distribution. It then draws a contour plot of the log-likelihood surface. In addition, profile likelihoods are drawn for each of the two parameters. The two parameters a and b are reparameterized in terms of the log to ensure that the optimization obeys the nonnegativity constraints.

The profile likelihood for log(a) is calculated constraining the log(b) parameter at the generating population parameter value value while varying mu. Likewise, the profile likelihood for log(b) is calculated constraining the log(a) parameter at the generating population parameter value while varying the log(b) parameter.

Click on the log-likelihood surface to indicate a starting position for the estimation. Alternatively, click on the Restart button to set the starting value to the method of moments (MOM) estimates. Once optimization is started, the Restart button can be used to start again at the initial values (however they were chosen).

There are several techniques that can be used to update the estimated parameter vector. Some techniques will almost reach a final solution in a fewer steps than others. Estimation can be restarted with different starting values at any time by clicking on the contour plot. New samples can also be generated.

When using Newton-Raphson or Fisher scoring, you can also try to optimize the stepsize. With these techniques, optimizing the stepsize overcomes bad steps where the algorithm attempts to step too far or not far enough toward the solution. Regardless, the methods also employ (automatically) Marquardt's modification such that the diagonals of the negative Hessian are increased if the matrix is singular. Optimizing stepsize has no effect on the other techniques. Rather, the other techniques always optimize the stepsize.

In theory, the PR and FR techniques are different. For this particular problem you will likely see no difference. Also, the DFP and BFGS are different, but you will see no difference in their output either. For optimizing the log-likelihood, these pairs of techniques build up curvature information that is slightly different, but then use a line optimizer to find the next step. This usually means that the slight difference in the accumulation of curvature information is dominated by the line minimizer.

Convergence is declared if the absolute value of the sum of the gradient is less than 1e-4 or if the maximum relative change in the estimated parameter vector is less than 1e-4.

Several statistics are shown for the optimization process. The main difference in the techniques is the calculation of the negative H matrix. Note that this matrix is included in the output. If optimzing the stepsize, the multiplicative factor of the stepsize is included in the output.

The log(a) parameter is labeled beta_0 and the log(b) parameter is labeled beta_1 in the output.