Solutions and Applications Manual
Econometric Analysis Sixth Edition
William H. Greene New York University
Prentice Hall, Upper Saddle River, New Jersey 07458
Contents and Notation This book presents solutions to the end of chapter exercises and applications in Econometric Analysis. There are no exercises in the text for Appendices A – E. For the instructor or student who is interested in exercises for this material, I have included a number of them, with solutions, in this book. The various computations in the solutions and exercises are done with the NLOGIT Version 4.0 computer package (Econometric Software, Inc., Plainview New York, www.nlogit.com). In order to control the length of this document, only the solutions and not the questions from the exercises and applications are shown here. In some cases, the numerical solutions for the in text examples shown here differ slightly from the values given in the text. This occurs because in general, the derivative computations in the text are done using the digits shown in the text, which are rounded to a few digits, while the results shown here are based on internal computations by the computer that use all digits. Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18 Chapter 19 Chapter 20 Chapter 21 Chapter 22 Chapter 23 Chapter 24 Chapter 25 Appendix A Appendix B Appendix C Appendix D Appendix E
Introduction 1 The Classical Multiple Linear Regression Model 2 Least Squares 3 Statistical Properties of the Least Squares Estimator 10 Inference and Prediction 19 Functional Form and Structural Change 30 Specification Analysis and Model Selection 40 The Generalized Regression Model and Heteroscedasticity 44 Models for Panel Data 54 Systems of Regression Equations 67 Nonlinear Regressions and Nonlinear Least Squares 80 Instrumental Variables Estimation 85 SimultaneousEquations Models 90 Estimation Frameworks in Econometrics 97 Minimum Distance Estimation and The Generalized Method of Moments 102 Maximum Likelihood Estimation 105 Simulation Based Estimation and Inference 117 Bayesian Estimation and Inference 120 Serial Correlation 122 Models with Lagged Variables 128 TimeSeries Models 131 Nonstationary Data 132 Models for Discrete Choice 136 Truncation, Censoring and Sample Selection 142 Models for Event Counts and Duration 147 Matrix Algebra 155 Probability and Distribution Theory 162 Estimation and Inference 172 Large Sample Distribution Theory 183 Computation and Optimization 184
In the solutions, we denote: • scalar values with italic, lower case letters, as in a, • column vectors with boldface lower case letters, as in b, • row vectors as transposed column vectors, as in b′, • matrices with boldface upper case letters, as in M or Σ, • single population parameters with Greek letters, as in θ, • sample estimates of parameters with Roman letters, as in b as an estimate of β,
ˆ or βˆ , • sample estimates of population parameters with a caret, as in α • cross section observations with subscript i, as in yi, time series observations with subscript t, as in zt and panel data observations with xit or xi,t1 when the comma is needed to remove ambiguity. Observations that are vectors are denoted likewise, for example, xit to denote a column vector of observations. These are consistent with the notation used in the text.
Chapter 1 Introduction There are no exercises or applications in Chapter 1.
Chapter 2 The Classical Multiple Linear Regression Model There are no exercises or applications in Chapter 2.
2
Chapter 3 Least Squares Exercises ⎡ 1 x1 ⎤
1. Let X = ⎢... ... ⎥ .
⎢1 x ⎥ n⎦ ⎣
(a) The normal equations are given by (312), X'e = 0 (we drop the minus sign), hence for each of the columns of X, xk, we know that xk′e = 0. This implies that Σ in=1ei = 0 and Σ in=1 xi ei = 0 . (b) Use Σ in=1ei to conclude from the first normal equation that a = y − bx . (c) We know that Σ in=1ei = 0 and Σ in=1 xi ei = 0 . It follows then that Σ in=1 ( xi − x )ei = 0 because
Σ in=1 xei = x Σ in=1ei = 0 . Substitute ei to obtain Σ in=1 ( xi − x )( yi − a − bxi ) = 0 or Σ in=1 ( xi − x )( yi − y − b( xi − x )) = 0 Then, Σin=1 ( xi − x )( yi − y ) = bΣin=1 ( xi − x )( xi − x )) so b =
Σin=1 ( xi − x )( yi − y ) . Σin=1 ( xi − x )2
(d) The first derivative vector of e′e is 2X′e. (The normal equations.) The second derivative matrix is ∂2(e′e)/∂b∂b′ = 2X′X. We need to show that this matrix is positive definite. The diagonal elements are 2n and 2Σ in=1 xi2 which are clearly both positive. The determinant is (2n)( 2Σ in=1 xi2 )( 2Σ in=1 xi )2 = 4nΣin=1 xi2 4( nx )2 = 4n[(Σ in=1 xi2 ) − nx 2 ] = 4n[(Σ in=1 ( xi − x ) 2 ] . Note that a much simpler proof appears after (36). 2. Write c as b + (c  b). Then, the sum of squared residuals based on c is (y  Xc)′(y  Xc) = [y  X(b + (c  b))] ′[y  X(b + (c  b))] = [(y  Xb) + X(c  b)] ′[(y  Xb) + X(c  b)] = (y  Xb) ′(y  Xb) + (c  b) ′X′X(c  b) + 2(c  b) ′X′(y  Xb). But, the third term is zero, as 2(c  b) ′X′(y  Xb) = 2(c  b)X′e = 0. Therefore, (y  Xc) ′(y  Xc) = e′e + (c  b) ′X′X(c  b) or (y  Xc) ′(y  Xc)  e′e = (c  b) ′X′X(c  b). The right hand side can be written as d′d where d = X(c  b), so it is necessarily positive. This confirms what we knew at the outset, least squares is least squares. 3. The residual vector in the regression of y on X is MXy = [I  X(X′X)1X′]y. The residual vector in the regression of y on Z is = [I  Z(Z′Z)1Z′]y MZy = [I  XP((XP)′(XP))1(XP)′)y = [I  XPP1(X′X)1(P′)1P′X′)y = MXy Since the residual vectors are identical, the fits must be as well. Changing the units of measurement of the regressors is equivalent to postmultiplying by a diagonal P matrix whose kth diagonal element is the scale factor to be applied to the kth variable (1 if it is to be unchanged). It follows from the result above that this will not change the fit of the regression. 4. In the regression of y on i and X, the coefficients on X are b = (X′M0X)1X′M0y. M0 = I  i(i′i)1i′ is the matrix which transforms observations into deviations from their column means. Since M0 is idempotent and symmetric we may also write the preceding as [(X′M0′)(M0X)]1(X′M0′)(M0y) which implies that the
3
regression of M0y on M0X produces the least squares slopes. If only X is transformed to deviations, we would compute [(X′M0′)(M0X)]1(X′M0′)y but, of course, this is identical. However, if only y is transformed, the result is (X′X)1X′M0y which is likely to be quite different. 5. What is the result of the matrix product M1M where M1 is defined in (319) and M is defined in (314)? M1M = (I  X1(X1′X1)1X1′)(I  X(X′X)1X′) = M  X1(X1′X1)1X1′M There is no need to multiply out the second term. Each column of MX1 is the vector of residuals in the regression of the corresponding column of X1 on all of the columns in X. Since that x is one of the columns in X, this regression provides a perfect fit, so the residuals are zero. Thus, MX1 is a matrix of zeroes which implies that M1M = M. 6. The original X matrix has n rows. We add an additional row, xs′. The new y vector likewise has an ⎡ Xn ⎤ ⎡y n ⎤ additional element. Thus, X n , s = ⎢ ⎥ and y n , s = ⎢ ⎥ . The new coefficient vector is ⎣ x′s ⎦ ⎣ ys ⎦ bn,s = (Xn,s′ Xn,s)1(Xn,s′yn,s). The matrix is Xn,s′Xn,s = Xn′Xn + xsxs′. To invert this, use (A 66); 1 ( X′n , s Xn , s ) −1 = ( X′n Xn )−1 − ( X′n Xn ) −1 x s x′s ( X′n X n ) −1 . The vector is 1 + x′s ( X′n X n ) −1 x s (Xn,s′yn,s) = (Xn′yn) + xsys. Multiply out the four terms to get (Xn,s′ Xn,s)1(Xn,s′yn,s) = 1 1 ( X′n X n ) −1 x s x′s b n + ( X′n X n ) −1 xsys − ( X′n X n ) −1 x s x′s ( X′n X n ) −1 xsys bn – 1 + x′s ( X′n X n ) −1 x s 1 + x′s ( X′n X n ) −1 x s = bn + ( X′n X n ) −1 xsys –
x′s ( X′n X n ) −1 x s 1 ( X′n X n ) −1 x s ys – ( X′n X n ) −1 x s x′s b n 1 + x′s ( X′n X n ) −1 x s 1 + x′s ( X′n X n ) −1 x s
⎡ x′ ( X′ X )−1 x ⎤ 1 ( X′n X n ) −1 x s x′s b n bn + ⎢1 − s n n −1 s ⎥ ( X′n Xn )−1 x s ys – −1 ′ ′ ′ ′ x X X x + 1 ( ) 1 ( ) + x X X x s n n s ⎦ ⎣ s n n s 1 1 ( X′n X n ) −1 x s ys – ( X′n X n ) −1 x s x′s b n bn + 1 + x′s ( X′n X n ) −1 x s 1 + x′s ( X′n X n ) −1 x s bn +
1 ( X′n Xn ) −1 x s ( ys − x′s b n ) ′ 1 + x s ( X′n Xn ) −1 x s
⎡y ⎤ ⎡i x 0⎤ ⎡ 0⎤ = ⎢ X1 , ⎥ = [ X1 X2 ] and y = ⎢ o ⎥ . (The subscripts 7. Define the data matrix as follows: X = ⎢ ⎥ ⎣1 0 1 ⎦ ⎣ 1 ⎦ ⎣ ym ⎦ on the parts of y refer to the “observed” and “missing” rows of X. We will use FrishWaugh to obtain the first two columns of the least squares coefficient vector. b1=(X1′M2X1)1(X1′M2y). Multiplying it out, we find that M2 = an identity matrix save for the last diagonal element that is equal to 0. ⎡ 0 0⎤ X1′M2X1 = X1′ X1 − X1′ ⎢ ⎥ X1 . This just drops the last observation. X1′M2y is computed likewise. Thus, ⎣ 0′ 1 ⎦ the coeffients on the first two columns are the same as if y0 had been linearly regressed on X1. The denomonator of R2 is different for the two cases (drop the observation or keep it with zero fill and the dummy variable). For the first strategy, the mean of the n1 observations should be different from the mean of the full n unless the last observation happens to equal the mean of the first n1. For the second strategy, replacing the missing value with the mean of the other n1 observations, we can deduce the new slope vector logically. Using FrischWaugh, we can replace the column of x’s with deviations from the means, which then turns the last observation to zero. Thus, once again, the coefficient on the x equals what it is using the earlier strategy. The constant term will be the same as well.
4
8. For convenience, reorder the variables so that X = [i, Pd, Pn, Ps, Y]. The three dependent variables are Ed, En, and Es, and Y = Ed + En + Es. The coefficient vectors are bd = (X′X)1X′Ed, bn = (X′X)1X′En, and bs = (X′X)1X′Es. The sum of the three vectors is b = (X′X)1X′[Ed + En + Es] = (X′X)1X′Y. Now, Y is the last column of X, so the preceding sum is the vector of least squares coefficients in the regression of the last column of X on all of the columns of X, including the last. Of course, we get a perfect fit. In addition, X′[Ed + En + Es] is the last column of X′X, so the matrix product is equal to the last column of an identity matrix. Thus, the sum of the coefficients on all variables except income is 0, while that on income is 1. 2
2
9. Let R K denote the adjusted R2 in the full regression on K variables including xk, and let R1 denote the adjusted R2 in the short regression on K1 variables when xk is omitted. Let RK2 and R12 denote their unadjusted counterparts. Then, RK2 = 1  e′e/y′M0y R12 = 1  e1′e1/y′M0y
where e′e is the sum of squared residuals in the full regression, e1′e1 is the (larger) sum of squared residuals in the regression which omits xk, and y′M0y = Σi (yi  y )2 Then,
2
R K = 1  [(n1)/(nK)](1  RK2 ) 2
and R1 = 1  [(n1)/(n(K1))](1  R12 ). The difference is the change in the adjusted R2 when xk is added to the regression, 2
2
R K  R1 = [(n1)/(nK+1)][e1′e1/y′M0y]  [(n1)/(nK)][e′e/y′M0y]. The difference is positive if and only if the ratio is greater than 1. After cancelling terms, we require for the adjusted R2 to increase that e1′e1/(nK+1)]/[(nK)/e′e] > 1. From the previous problem, we have that e1′e1 = e′e + bK2(xk′M1xk), where M1 is defined above and bk is the least squares coefficient in the full regression of y on X1 and xk. Making the substitution, we require [(e′e + bK2(xk′M1xk))(nK)]/[(nK)e′e + e′e] > 1. Since e′e = (nK)s2, this simplifies to [e′e + bK2(xk′M1xk)]/[e′e + s2] > 1. Since all terms are positive, the fraction is greater than one if and only bK2(xk′M1xk) > s2 or bK2(xk′M1xk/s2) > 1. The denominator is the estimated variance of bk, so the result is proved.
10. This R2 must be lower. The sum of squares associated with the coefficient vector which omits the constant term must be higher than the one which includes it. We can write the coefficient vector in the regression without a constant as c = (0,b*) where b* = (W′W)1W′y, with W being the other K1 columns of X. Then, the result of the previous exercise applies directly. 11. We use the notation ‘Var[.]’ and ‘Cov[.]’ to indicate the sample variances and covariances. Our information is Var[N] = 1, Var[D] = 1, Var[Y] = 1. Since C = N + D, Var[C] = Var[N] + Var[D] + 2Cov[N,D] = 2(1 + Cov[N,D]). From the regressions, we have Cov[C,Y]/Var[Y] = Cov[C,Y] = .8. But, Cov[C,Y] = Cov[N,Y] + Cov[D,Y]. Also, Cov[C,N]/Var[N] = Cov[C,N] = .5, but, Cov[C,N] = Var[N] + Cov[N,D] = 1 + Cov[N,D], so Cov[N,D] = .5, so that Var[C] = 2(1 + .5) = 1. And, Cov[D,Y]/Var[Y] = Cov[D,Y] = .4. Since Cov[C,Y] = .8 = Cov[N,Y] + Cov[D,Y], Cov[N,Y] = .4. Finally, Cov[C,D] = Cov[N,D] + Var[D] = .5 + 1 = .5. Now, in the regression of C on D, the sum of squared residuals is (n1){Var[C]  (Cov[C,D]/Var[D])2Var[D]}
5
based on the general regression result Σe2 = Σ(yi  y )2  b2Σ(xi  x )2. All of the necessary figures were obtained above. Inserting these and n1 = 20 produces a sum of squared residuals of 15. 12. The relevant submatrices to be used in the calculations are Investment *
Investment Constant GNP Interest
Constant 3.0500 15
GNP 3.9926 19.310 25.218
Interest 23.521 111.79 148.98 943.86
The inverse of the lower right 3×3 block is (X′X)1, 7.5874 7.41859 .27313
(X′X)1 =
7.84078 .598953
.06254637
The coefficient vector is b = (X′X)1X′y = (.0727985, .235622, .00364866)′. The total sum of squares is y′y = .63652, so we can obtain e′e = y′y  b′X′y. X′y is given in the top row of the matrix. Making the substitution, we obtain e′e = .63652  .63291 = .00361. To compute R2, we require Σi (xi  y )2 = .63652  15(3.05/15)2 = .01635333, so R2 = 1  .00361/.0163533 = .77925. 13. The results cannot be correct. Since log S/N = log S/Y + log Y/N by simple, exact algebra, the same result must apply to the least squares regression results. That means that the second equation estimated must equal the first one plus log Y/N. Looking at the equations, that means that all of the coefficients would have to be identical save for the second, which would have to equal its counterpart in the first equation, plus 1. Therefore, the results cannot be correct. In an exchange between Leff and Arthur Goldberger that appeared later in the same journal, Leff argued that the difference was simple rounding error. You can see that the results in the second equation resemble those in the first, but not enough so that the explanation is credible. Further discussion about the data themselves appeared in subsequent idscussion. [See Goldberger (1973) and Leff (1973).] 14. A proof of Theorem 3.1 provides a general statement of the observation made after (38). The counterpart for a multiple regression to the normal equations preceding (37) is + b2 Σi xi 2 + b3 Σi xi 3 + ... + bK Σi xiK = Σi yi b1n b1Σi xi 2 + b2 Σi xi22 + b3 Σi xi 2 xi 3
+ ... + bK Σi xi 2 xiK = Σ i xi 2 yi
... = Σi xiK yi . b1Σi xiK + b2 Σi xiK xi 2 + b3 Σi xiK xi 3 + ... + bK Σi xiK2 As before, divide the first equation by n, and manipulate to obtain the solution for the constant term, b1 = y − b2 x2 − ... − bK xK . Substitute this into the equations above, and rearrange once again to obtain the equations for the slopes,
b2 Σi ( xi 2 − x2 ) 2 + b3 Σi ( xi 2 − x2 )( xi 3 − x3 ) + ... + bK Σi ( xi 2 − x2 )( xiK − xK ) = Σi ( xi 2 − x2 )( yi − y ) b2 Σi ( xi 3 − x3 )( xi 2 − x2 ) + b3 Σi ( xi 3 − x3 ) 2 + ... + bK Σi ( xi 3 − x3 )( xiK − xK ) = Σi ( xi 3 − x3 )( yi − y ) ... b2 Σi ( xiK − xK )( xi 2 − x2 ) + b3 Σi ( xiK − xK )( xi 3 − x3 ) + ... + bK Σi ( xiK − xK ) 2 = Σi ( xiK − xK )( yi − y ).
If the variables are uncorrelated, then all cross product terms of the form Σ i ( xij − x j )( xik − xk ) will equal zero. This leaves the solution, b2 Σi ( xi 2 − x2 ) 2 = Σi ( xi 2 − x2 )( yi − y ) b3 Σi ( xi 3 − x3 ) 2 = Σi ( xi 3 − x3 )( yi − y ) ... bK Σi ( xiK − xK ) 2 = Σi ( xiK − xK )( yi − y ), which can be solved one equation at a time for bk = [ Σi ( xik − xk )( yi − y ) ] ⎡⎣ Σi ( xik − xk ) 2 ⎤⎦ , k = 2,...,K.
6
Each of these is the slope coefficient in the simple of y on the respective variable.
Application ?======================================================================= ? Chapter 3 Application 1 ?======================================================================= Read $ (Data appear in the text.) Namelist ; X1 = one,educ,exp,ability$ Namelist ; X2 = mothered,fathered,sibs$ ?======================================================================= ? a. ?======================================================================= Regress ; Lhs = wage ; Rhs = x1$ ++  Ordinary least squares regression   LHS=WAGE Mean = 2.059333   Standard deviation = .2583869   WTS=none Number of observs. = 15   Model size Parameters = 4   Degrees of freedom = 11   Residuals Sum of squares = .7633163   Standard error of e = .2634244   Fit Rsquared = .1833511   Adjusted Rsquared = .3937136E01   Model test F[ 3, 11] (prob) = .82 (.5080)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 1.66364000 .61855318 2.690 .0210 EDUC  .01453897 .04902149 .297 .7723 12.8666667 EXP  .07103002 .04803415 1.479 .1673 2.80000000 ABILITY  .02661537 .09911731 .269 .7933 .36600000 ?======================================================================= ? b. ?======================================================================= Regress ; Lhs = wage ; Rhs = x1,x2$ ++  Ordinary least squares regression   LHS=WAGE Mean = 2.059333   Standard deviation = .2583869   WTS=none Number of observs. = 15   Model size Parameters = 7   Degrees of freedom = 8   Residuals Sum of squares = .4522662   Standard error of e = .2377673   Fit Rsquared = .5161341   Adjusted Rsquared = .1532347   Model test F[ 6, 8] (prob) = 1.42 (.3140)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant .04899633 .94880761 .052 .9601 EDUC  .02582213 .04468592 .578 .5793 12.8666667 EXP  .10339125 .04734541 2.184 .0605 2.80000000 ABILITY  .03074355 .12120133 .254 .8062 .36600000 MOTHERED .10163069 .07017502 1.448 .1856 12.0666667 FATHERED .00164437 .04464910 .037 .9715 12.6666667 SIBS  .05916922 .06901801 .857 .4162 2.20000000 ?======================================================================= ? c. ?=======================================================================
7
Regress ; Lhs = mothered ; Rhs = x1 ; Res = meds $ Regress ; Lhs = fathered ; Rhs = x1 ; Res = feds $ Regress ; Lhs = sibs ; Rhs = x1 ; Res = sibss $ Namelist ; X2S = meds,feds,sibss $ Matrix ; list ; Mean(X2S) $ Matrix Result has 3 rows and 1 columns. 1 +1 .1184238D14 2 .1657933D14 3 .5921189D16 The means are (essentially) zero. The sums must be zero, as these new variables are orthogonal to the columns of X1. The first column in X1 is a column of ones, so this means that these residuals must sum to zero. ?======================================================================= ? d. ?======================================================================= Namelist ; X = X1,X2 $ Matrix ; i = init(n,1,1) $ Matrix ; M0 = iden(n)  1/n*i*i' $ Matrix ; b12 = *X'wage$ Calc ; list ; ym0y =(N1)*var(wage) $ Matrix ; list ; cod = 1/ym0y * b12'*X'*M0*X*b12 $ Matrix COD has 1 rows and 1 columns. 1 +1 .51613 Matrix ; e = wage  X*b12 $ Calc ; list ; cod = 1  1/ym0y * e'e $ ++ COD = .516134 The R squared is the same using either method of computation. Calc ; list ; RsqAd = 1  (n1)/(ncol(x))*(1cod)$ ++ RSQAD = .153235 ? Now drop the constant Namelist ; X0 = educ,exp,ability,X2 $ Matrix ; i = init(n,1,1) $ Matrix ; M0 = iden(n)  1/n*i*i' $ Matrix ; b120 = *X0'wage$ Matrix ; list ; cod = 1/ym0y * b120'*X0'*M0*X0*b120 $ Matrix COD has 1 rows and 1 columns. 1 +1 .52953 Matrix ; e0 = wage  X0*b120 $ Calc ; list ; cod = 1  1/ym0y * e0'e0 $ ++  Listed Calculator Results  ++ COD = .515973 The R squared now changes depending on how it is computed. It also goes up, completely artificially. ?======================================================================= ? e. ?======================================================================= The R squared for the full regression appears immediately below. ? f. Regress ; Lhs = wage ; Rhs = X1,X2 $ ++  Ordinary least squares regression   WTS=none Number of observs. = 15   Model size Parameters = 7   Degrees of freedom = 8   Fit Rsquared = .5161341  ++ +++++++
8
Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant .04899633 .94880761 .052 .9601 EDUC  .02582213 .04468592 .578 .5793 12.8666667 EXP  .10339125 .04734541 2.184 .0605 2.80000000 ABILITY  .03074355 .12120133 .254 .8062 .36600000 MOTHERED .10163069 .07017502 1.448 .1856 12.0666667 FATHERED .00164437 .04464910 .037 .9715 12.6666667 SIBS  .05916922 .06901801 .857 .4162 2.20000000 Regress ; Lhs = wage ; Rhs = X1,X2S $ ++  Ordinary least squares regression   WTS=none Number of observs. = 15   Model size Parameters = 7   Degrees of freedom = 8   Fit Rsquared = .5161341   Adjusted Rsquared = .1532347  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 1.66364000 .55830716 2.980 .0176 EDUC  .01453897 .04424689 .329 .7509 12.8666667 EXP  .07103002 .04335571 1.638 .1400 2.80000000 ABILITY  .02661537 .08946345 .297 .7737 .36600000 MEDS  .10163069 .07017502 1.448 .1856 .118424D14 FEDS  .00164437 .04464910 .037 .9715 .165793D14 SIBSS  .05916922 .06901801 .857 .4162 .592119D16
In the first set of results, the first coefficient vector is b1 = (X1′M2X1)1X1′M2y and b2 = (X2′M1X2)1X2′M1y In the second regression, the second set of regressors is M1X2, so b1 = (X1′M12 X1)1X1′M12y where M12 = I – (M1X2)[(M1X2)′(M1X2)]1(M1X2)′ Thus, because the “M” matrix is different, the coefficient vector is different. The second set of coefficients in the second regression is b2 = [(M1X2)′M1(M1X2)]1 (M1X2)M1y = (X2′M1X2)1X2′M1y because M1 is idempotent.
9
Chapter 4 Statistical Properties of the Least Squares Estimator Exercises 1. Consider the optimization problem of minimizing the variance of the weighted estimator. If the estimate is to be unbiased, it must be of the form c1 θˆ 1 + c2 θˆ 2 where c1 and c2 sum to 1. Thus, c2 = 1  c1. The function to minimize is MincL* = c12v1 + (1  c1)2v2. The necessary condition is ∂L*/∂c1 = 2c1v1  2(1  c1)v2 = 0 which implies c1 = v2 / (v1 + v2). A more intuitively appealing form is obtained by dividing numerator and denominator by v1v2 to obtain c1 = (1/v1) / [1/v1 + 1/v2]. Thus, the weight is proportional to the inverse of the variance. The estimator with the smaller variance gets the larger weight. 2. First, βˆ = c′y = c′x + c′ε. So E[ βˆ ] = βc′x and Var[ βˆ ] = σ2c′c. Therefore, MSE[ βˆ ] = β2[c′x  1]2 + σ2c′c. To minimize this, we set ∂MSE[ βˆ ]/∂c = 2β2[c′x  1]x + 2σ2c = 0. Collecting terms, β2(c′x  1)x = σ2c 2 Premultiply by x′ to obtain β (c′x  1)x′x = σ2x′c or c′x = β2x′x / (σ2 + β2x′x). Then, c = [(β2/σ2)(c′x  1)]x, so c = [1/(σ2/β2 + x′x)]x. Then, βˆ = c′y = x′y / (σ2/β2 + x′x). The expected value of this estimator is E[ βˆ ] = βx′x / (σ2/β2 + x′x) so E[ βˆ ]  β = β(σ2/β2) / (σ2/β2 + x′x) = (σ2/β) / (σ2/β2 + x′x) while its variance is Var[x′(xβ + ε) / (σ2/β2 + x′x)] = σ2x′x / (σ2/β2 + x′x)2 The mean squared error is the variance plus the squared bias, MSE[ βˆ ] = [σ4/β2 + σ2x′x]/[σ2/β2 + x′x]2. The ordinary least squares estimator is, as always, unbiased, and has variance and mean squared error MSE(b) = σ2/x′x. The ratio is taken by dividing each term in the numerator MSE ⎡⎣βˆ ⎤⎦ (σ 4 / β 2 ) / (σ 2 / x' x) + σ 2 x' x / (σ 2 / x' x) = 2 ΜS Ε(b ) σ 2 / β 2 + x' x
(
2
2
)
2
2
2
= [σ x′x/β + (x′x) ]/(σ /β + x′x)2 = x′x[σ2/β2 + x′x]/(σ2/β2 + x′x)2 = x′x/(σ2/β2 + x′x) Now, multiply numerator and denominator by β2/σ2 to obtain MSE[ βˆ ]/MSE[b] = β2x′x/σ2/[1 + β2x′x/σ2] = τ2/[1 + τ2] As τ→∞, the ratio goes to one. This would follow from the result that the biased estimator and the unbiased estimator are converging to the same thing, either as σ2 goes to zero, in which case the MMSE estimator is the same as OLS, or as x′x grows, in which case both estimators are consistent.
10
3. The OLS estimator fit without a constant term is b = x′y / x′x. Assuming that the constant term is, in fact, zero, the variance of this estimator is Var[b] = σ2/x′x. If a constant term is included in the regression, then, b′ = Σ in=1 ( xi − x )( yi − y ) / Σ in=1 ( xi − x )
2
The appropriate variance is σ2/ Σ in=1 ( xi − x ) as always. The ratio of these two is 2
Var[b]/Var[b′] = [σ2/x′x] / [σ2/ Σ in=1 ( xi − x ) ] 2
But,
Σ in=1 ( xi − x ) = x′x + n x 2
2
so the ratio is Var[b]/Var[b′] = [x′x + n x 2]/x′x = 1  n x 2/x′x = 1  { n x 2/[Sxx + n x 2]} < 1 It follows that fitting the constant term when it is unnecessary inflates the variance of the least squares estimator if the mean of the regressor is not zero. 4. We could write the regression as yi = (α + λ) + βxi + (εi  λ) = α* + βxi + εi*. Then, we know that E[εi*] = 0, and that it is independent of xi. Therefore, the second form of the model satisfies all of our assumptions for the classical regression. Ordinary least squares will give unbiased estimators of α* and β. As long as λ is not zero, the constant term will differ from α. 5. Let the constant term be written as a = Σidiyi = Σidi(α + βxi + εi) = αΣidi + βΣidixi + Σidiεi. In order for a to be unbiased for all samples of xi, we must have Σidi = 1 and Σidixi = 0. Consider, then, minimizing the variance of a subject to these two constraints. The Lagrangean is L* = Var[a] + λ1(Σidi  1) + λ2Σidixi where Var[a] = Σi σ2di2. Now, we minimize this with respect to di, λ1, and λ2. The (n+2) necessary conditions are ∂L*/∂di = 2σ2di + λ1 + λ2xi, ∂L*/∂λ1 = Σi di  1, ∂L*/∂λ2 = Σi dixi The first equation implies that di = [1/(2σ2)](λ1 + λ2xi). Therefore, Σi di = 1 = [1/(2σ2)][nλ1 + (Σi xi)λ2] and Σi dixi = 0 = [1/(2σ2)][(Σi xi)λ1 + (Σi xi2)λ2]. We can solve these two equations for λ1 and λ2 by first multiplying both equations by 2σ2 then writing the 1 ⎛ λ1 ⎞ Σ i xi ⎤ ⎡ − 2σ 2 ⎤ Σ i xi ⎤ ⎛⎜ λ 1 ⎞⎟ ⎡− 2 σ 2 ⎤ ⎡ n ⎜ ⎟ ⎡ n ⎢ ⎥. resulting equations as ⎢ =⎢ ⎥. The solution is ⎜ λ2 ⎟ = ⎢ 2⎥ λ 2⎥ ⎜ ⎟ ⎢⎣ Σ i xi Σ i xi ⎥⎦ ⎢⎣ 0 ⎥⎦ ⎣ Σ i xi Σ i xi ⎦ ⎜⎝ 2 ⎟⎠ ⎢⎣ 0 ⎥⎦ ⎝ ⎠ Note, first, that Σi xi = n x . Thus, the determinant of the matrix is nΣi xi2  (n x )2 = n(Σi xi2  n x 2) = nSxx ⎛ λ1 ⎞ 1 ⎡Σi xi2 −nx ⎤ ⎡ −2σ2 ⎤ 2 where Sxx Σ in=1 ( xi − x ) . The solution is, therefore, ⎜ ⎟ = ⎢ ⎥ ⎢ ⎥ 0 ⎦ ⎣ 0 ⎦ ⎝ λ 2 ⎠ nS xx ⎣ −nx λ1 = (2σ2)(Σi xi2/n)/Sxx λ2 = (2σ2 x )/Sxx Then, di = [Σi xi2/n  x xi]/Sxx This simplifies if we writeΣxi2 = Sxx + n x 2, so Σi xi2/n = Sxx/n + x 2. Then, di = 1/n + x ( x  xi)/Sxx, or, in a more familiar form, di = 1/n  x (xi  x )/Sxx. This makes the intercept term Σidiyi = (1/n)Σiyi  x Σ in=1 ( xi − x ) yi /Sxx = y  b x which was to be shown. or
6. Let q = E[Q]. Then, q = α + βP, or P = (α/β) + (1/β)q. Using a well known result, for a linear demand curve, marginal revenue is MR = (α/β) + (2/β)q. The profit maximizing output is that at which marginal revenue equals marginal cost, or 10. Equating MR to 10 and solving for q produces q = α/2 + 5β, so we require a confidence interval for this combination of the parameters. The least squares regression results are Qˆ = 20.7691  .840583. The estimated covariance matrix
− 0.624559⎤ ⎡ 7.96124 of the coefficients is ⎢ ⎥ . The estimate of q is 6.1816. The estimate of the variance ⎣− 0.624559 0.0564361 ⎦ of qˆ is (1/4)7.96124 + 25(.056436) + 5(.0624559) or 0.278415, so the estimated standard error is 0.5276.
11
The 95% cutoff value for a t distribution with 13 degrees of freedom is 2.161, so the confidence interval is 6.1816  2.161(.5276) to 6.1816 + 2.161(.5276) or 5.041 to 7.322. 7. a. The sample means are (1/100) times the elements in the first column of X'X. The sample covariance matrix for the three regressors is obtained as (1/99)[(X′X) ij 100 xi x j ]. . 0.069899 0.555489 ⎤ ⎡ 10127 ⎢ Sample Var[x] = ⎢0.069899 0.755960 0.417778⎥⎥ The simple correlation matrix is ⎢⎣ 0.555489 0.417778 0.496969⎥⎦ .07971 .78043⎤ ⎡ 1 ⎢.07971 1 .68167⎥⎥ ⎢ ⎢⎣.78043 .68167 1 ⎥⎦
b. The vector of slopes is (X′X)1X′y = [.4022, 6.123, 5.910, 7.525]′. c. For the three short regressions, the coefficient vectors are (1) one, x1, and x2: [.223, 2.28, 2.11]′ (2) one, x1, and x3 [.0696, .229, 4.025]′ (3) one, x2, and x3: [.0627, .0918, 4.358]′ d. The magnification factors are for x1: [(1/(99(1.01727)) / 1.129]2 = .094 for x2: [(1/99(.75596)) / 1.11]2 = .109 for x3: [(1/99(.496969))/ 4.292]2 = .068. e. The problem variable appears to be x3 since it has the lowest magnification factor. In fact, all three are highly intercorrelated. Although the simple correlations are not excessively high, the three multiple correlations are .9912 for x1 on x2 and x3, .9881 for x2 on x1 and x3, and .9912 for x3 on x1 and x2. 8. We consider two regressions. In the first, y is regressed on K variables, X. The variance of the least squares estimator, b = (X′X)1X′y, Var[b] = σ2(X′X)1. In the second, y is regressed on X and an additional variable, z. Using results for the partitioned regression, the coefficients on X when y is regressed on X and z are b.z = (X′MzX)1X′Mzy where Mz = I  z(z′z)1z′. The true variance of b.z is the upper left K×K matrix in −1 ⎡X' X X'z ⎤ Var[b,c] = s2 ⎢ z' X z' X⎥ . But, we have already found this above. The submatrix is Var[b.z] = ⎣ ⎦ s2(X′MzX)1. We can show that the second matrix is larger than the first by showing that its inverse is smaller. (See (A120).) Thus, as regards the true variance matrices (Var[b])1  (Var[b.z])1 = (1/σ2)z(z′z)1z′ which is a nonnegative definite matrix. Therefore Var[b]1 is larger than Var[b.z]1, which implies that Var[b] is smaller. Although the true variance of b is smaller than the true variance of b.z, it does not follow that the estimated variance will be. The estimated variances are based on s2, not the true σ2. The residual variance estimator based on the short regression is s2 = e′e/(n  K) while that based on the regression which includes z is sz2 = e.z′e.z/(n  K  1). The numerator of the second is definitely smaller than the numerator of the first, but so is the denominator. It is uncertain which way the comparison will go. The result is derived in the previous problem. We can conclude, therefore, that if t ratio on c in the regression which includes z is larger than one in absolute value, then sz2 will be smaller than s2. Thus, in the comparison, Est.Var[b] = s2(X′X)1 is based on a smaller matrix, but a larger scale factor than Est.Var[b.z] = sz2(X′MzX)1. Consequently, it is uncertain whether the estimated standard errors in the short regression will be smaller than those in the long one. Note that it is not sufficient merely for the result of the previous problem to hold, since the relative sizes of the matrices also play a role. But, to take a polar case, suppose z and X were uncorrelated. Then, XNMzX equals XNX. Then, the estimated variance of b.z would be less than that of b without z even though the true variance is the same (assuming the premise of the previous problem holds). Now, relax this assumption while holding the t ratio on c constant. The matrix in Var[b.z] is now larger, but the leading scalar is now smaller. Which way the product will go is uncertain. 9. The F ratio is computed as [b′X′Xb/K]/[e′e/(n  K)]. We substitute e = Mε, and
12
b = β + (X′X)1X′ε = (X′X)1X′ε. Then, F = [ε′X(X′X)1X′X(X′X)1X′ε/K]/[ε ′Mε/(n  K)] = [ε′(I  M)ε/K]/[ε′Mε/(n  K)]. The exact expectation of F can be found as follows: F = [(nK)/K][ε′(I  M)ε]/[ε′Mε]. So, its exact expected value is (nK)/K times the expected value of the ratio. To find that, we note, first, that Mε and (I  M)ε are independent because M(I  M) = 0. Thus, E{[ε′(I  M)ε]/[ε′Mε]} = E[ε′(I M)ε]×E{1/[ε′Mε]}. The first of these was obtained above, E[ε′(I  M)ε] = Kσ2. The second is the expected value of the reciprocal of a chisquared variable. The exact result for the reciprocal of a chisquared variable is E[1/χ2(nK)] = 1/(n  K  2). Combining terms, the exact expectation is E[F] = (n  K) / (n  K  2). Notice that the mean does not involve the numerator degrees of freedom. 10. We write b = β + (X′X)1X′ε, so b′b = β′β + ε′X(X′X)1(X′X)1X′ε + 2β′(X′X)1X′ε. The expected value of the last term is zero, and the first is nonstochastic. To find the expectation of the second term, use the trace, and permute ε′X inside the trace operator. Thus, E[β′β] = β′β + E[ε′X(X′X)1(X′X)1X′ε] = β′β + E[tr{ε′X(X′X)1(X′X)1X′ε}] = β′β + E[tr{(X′X)1X′εε′X(X′X)1}] = β′β + tr[E{(X′X)1X′εε′X(X′X)1}] = β′β + tr[(X′X)1X′E[εε′]X(X′X)1] = β′β + tr[(X′X)1X′(σ2I)X(X′X)1] = β′β + σ2tr[(X′X)1X′X(X′X)1] = β′β + σ2tr[(X′X)1] = β′β + σ2Σk (1/λk ) The trace of the inverse equals the sum of the characteristic roots of the inverse, which are the reciprocals of the characteristic roots of X′X. 11. The F ratio is computed as [b′X′Xb/K]/[e′e/(n  K)]. We substitute e = M, and b = β + (X′X)1X′ε = (X′X)1X′ε. Then, F = [ε′X(X′X)1X′X(X′X)1X′ε/K]/[ε ′Mε/(n  K)] = [ε′(I  M)ε/K]/[ε′Mε/(n  K)]. The denominator converges to σ2 as we have seen before. The numerator is an idempotent quadratic form in a normal vector. The trace of (I  M) is K regardless of the sample size, so the numerator is always distributed as σ2 times a chisquared variable with K degrees of freedom. Therefore, the numerator of F does not converge to a constant, it converges to σ2/K times a chisquared variable with K degrees of freedom. Since the denominator of F converges to a constant, σ2, the statistic converges to a random variable, (1/K) times a chisquared variable with K degrees of freedom. 12. We can write ei as ei = yi  b′xi = (β′xi + εi)  b′xi = εi + (b  β)′xi We know that plim b = β, and xi is unchanged as n increases, so as n→∞, ei is arbitrarily close to εi. 13. The estimator is y = (1/n)Σi yi = (1/n)Σi (μ + εi) = μ + (1/n)Σi εi. Then, E[ y ] = μ+ (1/n)Σi E[εi] = μ and Var[ y ]= (1/n2)Σi Σj Cov[εi,εj] = σ2/n. Since the mean equals μ and the variance vanishes as n→∞, y is mean square consistent. In addition, since y is a linear combination of normally distributed variables, y has a normal distribution with the mean and variance given above in every sample. Suppose that εi were not normally distributed. Then, n ( y μ) = (1/ n )(Σiεi) satisfies the requirements for the central limit theorem. Thus, the asymptotic normal distribution applies whether or not the disturbances have a normal distribution. For the alternative estimator, μˆ = Σi wiyi, so E[ μˆ ] = Σi wiE[yi] = Σi wiμ = μΣi wi = μ and Var[ μˆ ]= 2 2 Σi wi σ = σ2Σi wi2. The sum of squares of the weights is Σiwi2 = Σi i2/[Σi i]2 = [n(n+1)(2n+1)/6]/[n(n+1)/2]2 = [2(n2 + 3n/2 + 1/2)]/[1.5n(n2 + 2n + 1)]. As n→∞, the fraction will be dominated by the term (1/n) and will tend to zero. This establishes the consistency of this estimator. The last expression also provides the asymptotic variance. The large sample variance can be found as Asy.Var[ μˆ ] = (1/n)lim n→∞Var[ n ( μˆ μ)]. For the estimator above, we can use Asy.Var[ μˆ ] = (1/n)lim n→∞nVar[ μˆ  μ] = (1/n)lim n→∞σ2[2(n2 +
13
3n/2 + 1/2)]/[1.5(n2 + 2n + 1)] = 1.3333σ2. Notice that this is unambiguously larger than the variance of the sample mean, which is the ordinary least squares estimator. 14. To obtain the asymptotic distribution, write the result already in hand as b = (β + Q1γ) + (X′X)1X′ε  Q1 ε. We have established that plim b = β + Q1γ. For convenience, let θ ≠ β denote β + Q1γ = plim b. Write the preceding in the form b  θ = (X′X/n)1(X′ε/n)  Q1γ. Since plim(X′X/n) = Q, the large sample behavior of the right hand side is the same as that of plim (b  θ) = Q1plim(X′ε/n)  Q1γ. That is, we may replace (X′X/n) with Q in our derivation. Then, we seek the asymptotic distribution of n (b  θ) which is the same as that of n [Q1plim(X′ε/n)  Q1γ] = Q1 n (1/ n)Σ in=1 ( x i ε i  γ ) . From this point, the derivation is exactly the same as that when γ = 0, so there is no need to redevelop the result. We may proceed directly to the same asymptotic distribution we obtained before. The only difference is that the least squares estimator estimates θ, not β. 15. a. To solve this, we will use an extension of Exercise 6 in Chapter 3 (adding one row of data), and the necessary matrix result, (A66b) in which B will be Xm and C will be I. Bypassing the matrix algebra, which will be essentially identical to the earlier exercise, we have bc,m = bc + [I + Xm(Xc′Xc)1Xm]1(Xc′Xc)1Xm′(ym – Xmbc) But, in this case, ym is precisely Xmbc, so the ending vector is zero. Thus, the coefficient vector is the same. b. The model applies to the first nc observations, so bc is the least squares estimator for those observations. Yes, it is unbiased. c. The residuals at the second step are ec and (Xmbc – Xmbc) = (ec′, 0′)′. Thus, the sum of squares is the same at both steps. d. The numerator of s2 is the same in both cases, however, for the second one, the degrees of freedom is larger. The first is unbiased, so the second one must be biased downward.
Applications ?======================================================================= ? Chapter 4 Application 1 ?======================================================================= Read $ Year GasExp Pop Gasp Income PNC PUC PPT PD PN PS 1953 7.4 159565 16.668 8883 47.2 26.7 16.8 37.7 29.7 19.4 ... 2004 224.5 293951 123.901 27113 133.9 133.3 209.1 114.8 172.2 222.8 Sample ; 1  52 $ Create ; G = 1000000*gasexp/(gasp*pop)$ Create ; t = year  1952 $ Namelist ; X = one,income, gasp,pnc,puc,ppt,pd,pn,ps,t$ ?======================================================================= ? a. Basic regression ?======================================================================= Regress ; Lhs = g ; Rhs = X $ ++  Ordinary least squares regression   LHS=G Mean = 4.935619   Standard deviation = 1.059105   WTS=none Number of observs. = 52   Model size Parameters = 10   Degrees of freedom = 42   Residuals Sum of squares = .4985489   Standard error of e = .1089505   Fit Rsquared = .9912852   Adjusted Rsquared = .9894177   Model test F[ 9, 42] (prob) = 530.82 (.0000)  ++
14
+++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 1.10587817 .56937860 1.942 .0588 INCOME  .00021575 .517619D04 4.168 .0001 16805.0577 GASP  .01108386 .00397812 2.786 .0080 51.3429615 PNC  .00057735 .01284414 .045 .9644 87.5673077 PUC  .00587463 .00487032 1.206 .2345 77.8000000 PPT  .00690726 .00483613 1.428 .1606 89.3903846 PD  .00122888 .01188175 .103 .9181 78.2692308 PN  .01269051 .01259799 1.007 .3195 83.5980769 PS  .02802781 .00799625 3.505 .0011 89.7769231 T  .07250369 .01418280 5.112 .0000 26.5000000 ?======================================================================= ? b. Hypothesis that b(NC) = b(UC) $ ?======================================================================= Calc ; list ; (b(4)b(5))/sqr(varb(4,4)+varb(5,5)2*varb(4,5)) $ ++  Listed Calculator Results  ++ Result = .494883 ?======================================================================= ? c. Elasticities. In each case, elasticity = b*xbar/ybar ?======================================================================= Calc ; g2004 = g(52)$ Calc ; i2004 = income(52)$ Calc ; pg2004 = gasp(52)$ Calc ; ppt2004 = ppt(52)$ Calc ; list ; ei = b(2)*i2004/g2004 ; ep = b(3)*pg2004/g2004 ; eppt = b(6)*ppt2004/g2004$ ++  Listed Calculator Results  ++ EI = .948988 EP = .222792 EPPT = .234311 ?======================================================================= ? d. Log regression ?======================================================================= Create ; logg = log(g) ; logpg = log(gasp) ; logi = log(income) ; logpnc=log(pnc) ; logpuc = log(puc) ; logppt = log(ppt) ; logpd = log(pd) ; logpn = log(pn) ; logps = log(ps) $ Namelist ; LogX = one,logi,logpg,logpnc,logpuc,logppt,logpd,logpn,logps,t$ Regress ; lhs = logg ; rhs = logx $ ++  Ordinary least squares regression   LHS=LOGG Mean = 1.570475   Standard deviation = .2388115   WTS=none Number of observs. = 52   Model size Parameters = 10   Degrees of freedom = 42   Residuals Sum of squares = .3812817E01   Standard error of e = .3012994E01   Fit Rsquared = .9868911   Adjusted Rsquared = .9840821   Model test F[ 9, 42] (prob) = 351.33 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 7.28719016 2.52056245 2.891 .0061 LOGI  .99299135 .25037574 3.966 .0003 9.67214751 LOGPG  .06051812 .05401018 1.120 .2689 3.72930296 LOGPNC  .15471632 .26696298 .580 .5653 4.38036654 LOGPUC  .48909058 .08519952 5.741 .0000 4.10544881 LOGPPT  .01926966 .13644891 .141 .8884 4.14194132
15
LOGPD  1.73205775 .25988611 6.665 .0000 4.23906603 LOGPN  .72953933 .26506853 2.752 .0087 4.23689080 LOGPS  .86798166 .35291106 2.459 .0181 4.17535768 T  .03797198 .00751371 5.054 .0000 26.5000000 ?======================================================================= ? e. Correlations of Price Variables ?======================================================================= Namelist ; Prices = pnc,puc,ppt,pd,pn,ps$ Matrix ; list ; xcor(prices) $ Correlation Matrix for Listed Variables PNC PUC PPT PD PN PS PNC 1.00000 .99387 .98074 .99327 .98853 .97849 PUC .99387 1.00000 .98242 .98783 .98220 .97685 PPT .98074 .98242 1.00000 .95847 .98986 .99751 PD .99327 .98783 .95847 1.00000 .97734 .95633 PN .98853 .98220 .98986 .97734 1.00000 .99358 PS .97849 .97685 .99751 .95633 .99358 1.00000 ?======================================================================= ? f. Renormalizations of price variables ?======================================================================= /* In the linear case, the coefficients would be divided by the same scale factor, so that x*b would be unchanged, where x is a variable and b is the coefficient. In the loglinear case, since log(k*x)= log(k)+log(x), the renomalization would simply affect the constant term. The price coefficients woulde be unchanged. */ ?======================================================================= ? g. Oaxaca decomposition ?======================================================================= Dates ; 1953 $ Period ; 19531973 $ Matrix ; xb0 = Mean(logx)$ Regress ; lhs = logg ; rhs = logx $ Matrix ; b0 = b ; v0 = varb $ Calc ; yb0 = ybar $ Period ; 19742004 $ Matrix ; xb1 = mean(logx) $ Regress ; lhs = logg ; rhs = logx $ Matrix ; b1 = b ; v1 = varb $ Calc ; yb1 = ybar $ ? Now the decomposition Calc ; list ; dybar = yb1  yb0 $ Total Calc ; list ; dy_dx = b1'xb1  b1'xb0 $ Change due to change in x Calc ; list ; dy_db = b1'xb0  b0'xb0 $ Matrix ; vdb = v1+v0 ; vdb = xb0'[vdb]xb0 $ Calc ; sdb = sqr(vdb) ; list ; lower = dy_db  1.96*sqr(vdb) ; upper = dy_db + 1.96*sqr(vdb) $ ++  Listed Calculator Results  ++ DYBAR = .395377 DY_DX = .122745 DY_DB = .272631 LOWER = .184844 UPPER = .360419
16
?======================================================================= ? Chapter 4 Application 2 ?======================================================================= Create ; lc = log(cost/pf) ; lpl=log(pl/pf) ; lpk=log(pk/pf)$ Create ; lq = log(q) ; lqq = .5*lq*lq $ Namelist ; x = one,lq,lqq,lpk,lpl $ ? a. Cost function Regress; lhs = lc ; rhs = x ; printvc $ ++  Ordinary least squares regression   LHS=LC Mean = .3195570   Standard deviation = 1.542364   WTS=none Number of observs. = 158   Model size Parameters = 5   Degrees of freedom = 153   Residuals Sum of squares = 2.904896   Standard error of e = .1377906   Fit Rsquared = .9922222   Adjusted Rsquared = .9920189   Model test F[ 4, 153] (prob) =4879.59 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 6.81816332 .25243920 27.009 .0000 LQ  .40274543 .03148312 12.792 .0000 8.26548908 LQQ  .06089514 .00432530 14.079 .0000 35.7912728 LPK  .16203385 .04040556 4.010 .0001 .85978893 LPL  .15244470 .04659735 3.272 .0013 5.58162250 1 2 3 4 5 +1 .06373 .00238 .00031 .00399 .01047 2 .00238 .00099 .00013 .00010 .00020 3 .00031 .00013 .1870819D04 .1493338D04 .2453652D04 4 .00399 .00010 .1493338D04 .00163 .00102 5 .01047 .00020 .2453652D04 .00102 .00217 ?======================================================================= ? b. capital price coefficient ?======================================================================= Wald ; fn1 = 1  b_lpk  b_lpl $ ++  WALD procedure. Estimates and standard errors   for nonlinear functions and joint test of   nonlinear restrictions.   Wald Statistic = 266.36109   Prob. from Chisquared[ 1] = .00000  ++ ++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] ++++++ Fncn(1)  .68552145 .04200352 16.321 .0000 ?======================================================================= ? c. efficient scale ?======================================================================= Wald ; fn1 = exp((1b_lq)/b_lqq) $ ++  WALD procedure. Estimates and standard errors   for nonlinear functions and joint test of   nonlinear restrictions.   Wald Statistic = 21.74979   Prob. from Chisquared[ 1] = .00000  ++ ++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] ++++++ Fncn(1)  18177.1045 3897.59890 4.664 .0000 Calc ; qstar = waldfns(1) ; vqstar = varwald(1,1)
17
; list ; lower = qstar  1.96*sqr(vqstar) ; upper = qstar + 1.96*sqr(vqstar) $ ?======================================================================= ? d. Raw data ?======================================================================= ++  Listed Calculator Results  ++ LOWER = 10537.810653 UPPER = 25816.398344 Create ; output = q $ Sort ; lhs = output $ /*
The estimated efficient scale is 18177. There are 25 firms in the sample that have output larger than this. As noted in the problem, many of the largest firms in the sample are aggregates of smaller ones, so it is difficult to draw a conclusion here. However, some of the largest firms (Southern, American Electric power) are singly counted, and are much larger than this scale. The important point is that much of the output in the sample is produced by firms that are smaller than this efficient scale. There are unexploited economies of scale in this industry. */
18
Chapter 5 Inference and Prediction Exercises 1. The estimated covariance matrix for the least squares estimator is 0 0 ⎤ 0 0 ⎤ ⎡3900 / 29 ⎡.69 20 ⎢ ⎥ ⎢ 2 1 s (X′X) = 0 80 − 10⎥ = ⎢ 0 .40 −.051⎥⎥ where s2 = 520/(293) = 20. Then, 3900 ⎢ ⎢⎣ ⎢⎣ 0 −.051 .256 ⎥⎦ 0 − 10 80 ⎥⎦ the test may be based on t = (.4 + .9  1)/[.410 + .256  2(.051)]1/2 = .399. This is smaller than the critical value of 2.056, so we would not reject the hypothesis. 2. In order to compute the regression, we must recover the original sums of squares and cross products for y. These areX′y = X′Xb = [116, 29, 76]′. The total sum of squares is found using R2 = 1  e′e/y′M0y, so y′M0y = 520 / (52/60) = 600. The means are x 1 = 0, x 2 = 0, y = 4, so, y′y = 600 + 29(42) = 1064. The slope in the regression of y on x2 alone is b2 = 76/80, so the regression sum of squares is b22(80) = 72.2, and the residual sum of squares is 600  72.2 = 527.8. The test based on the residual sum of squares is F = [(527.8  520)/1]/[520/26] = .390. In the regression of the previous problem, the tratio for testing the same hypothesis would be t = .4/(.410)1/2 = .624 which is the square root of .39. 3. For the current problem, R = [0,I] where I is the last K2 columns. Therefore, R(X′X)1RN is the lower right K2×K2 block of (X′X)1. As we have seen before, this is (X2′M1X2)1. Also, (X′X)1R′ is the last K2 ⎡ ( X1 ' X1 ) −1 X1 ' X 2 ( X 2 ' M 1X 2 ) −1 ⎤ columns of (X′X)1. These are (X′X)1R′ = ⎢ ⎥ Finally, since q = 0, Rb ( X 2 ' M 1 X 2 ) −1 ⎢⎣ ⎥⎦ q = (0b1 + Ib2)  0 = b2. Therefore, the constrained estimator is ⎡ ( X1 ' X1 ) −1 X1 ' X 2 ( X 2 ' M 1X 2 ) −1 ⎤ ⎡b 1 ⎤ b* = ⎢ ⎥  ⎢ ⎥ (X2′M1X2)b2, where b1 and b2 are the multiple regression ( X 2 ' M 1 X 2 ) −1 ⎢⎣ ⎥⎦ ⎣b 2 ⎦ coefficients in the regression of y on both X1 and X2. Collecting terms, this produces b* = ⎡ ( X1 ' X1 ) −1 X1 ' X 2b 2 ⎤ ⎡b 1 ⎤ 1 ⎥ . But, we have from Section 6.3.4 that b1 = (X1′X1) X1′y  (X1′X1) ⎢b ⎥  ⎢ b2 ⎢⎣ ⎥⎦ ⎣ 2⎦ ⎡( X ' X ) −1 X1 ' y⎤ 1 X1′X2b2 so the preceding reduces to b* = ⎢ 1 1 ⎥ which was to be shown. 0 ⎢⎣ ⎥⎦ 0 If, instead, the restriction is β2 = β2 then the preceding is changed by replacing Rβ  q = 0 with Rβ  β20 = 0. Thus, Rb  q = b2  β20. Then, the constrained estimator is ⎡ ( X1 ' X1 ) −1 X1 ' X 2 ( X 2 ' M 1X 2 ) −1 ⎤ ⎡b 1 ⎤ 0 b* = ⎢ ⎥  ⎢ ⎥ (X2′M1X2)(b2  β2 ) ( X 2 ' M 1 X 2 ) −1 ⎢⎣ ⎥⎦ ⎣b 2 ⎦ or ⎡( X ' X ) −1 X ' X (b − β 02 ) ⎤ ⎡b 1 ⎤ b* = ⎢ ⎥ + ⎢ 1 1 0 1 2 2 ⎥ (β 2  b 2 ) ⎢⎣ ⎥⎦ ⎣b 2 ⎦ Using the result of the previous paragraph, we can rewrite the first part as b1* = (X1′X1)1X1′y  (X1′X1)1X1′X2β20 = (X1′X1)1X1′(y  X2β20) which was to be shown.
19
4. By factoring the result in (514), we obtain b* = [I  CR]b + w where C = (X′X)1R′[R(X′X)1R′]1 and w = Cq. The covariance matrix of the least squares estimator is Var[b*] = [I  CR]σ2(X′X)1[I  CR]′ = σ2(X′X)1 + σ2CR(X′X)1R′C′  σ2CR(X′X)1  σ2(X′X)1R′C′. By multiplying it out, we find that CR(X′X)1 = (X′X)1R′(R(X′X)1R′)1R(X′X)1 = CR(X′X)1R′C′ so Var[b*] = σ2(X′X)1  σ2CR(X′X)1R′C′ = σ2(X′X)1  σ2(X′X)1R′[R(X′X)1R′]1R(X′X)1 This may also be written as Var[b*] = σ2(X′X)1{I  R′(R(X′X)1R′)1R(X′X)1} = σ2(X′X)1{[σ2(X′X)1]1  R′[Rσ2(X′X)1R′]1R}σ2(X′X)1 2 1 Since Var[Rb] = Rσ (X′X) R′ this is the answer we seek. 5. The variance of the restricted least squares estimator is given in the second equation in the previous exercise. We know that this matrix is positive definite, since it is derived in the form B′σ2(X′X)1B′, and σ2(X′X)1 is positive definite. Therefore, it remains to show only that the matrix subtracted from Var[b] to obtain Var[b*] is positive definite. Consider, then, a quadratic form in Var[b*] = z′Var[b]z  σ2z′(X′X)1(R′[R(X′X)1R′]1R)(X′X)1z z′Var[b*]z = z′Var[b]z  w′[R(X′X)1R′]1w where w = σR(X′X)1z. It remains to show, therefore, that the inverse matrix in brackets is positive definite. This is obvious since its inverse is positive definite. This shows that every quadratic form in Var[b*] is less than a quadratic form in Var[b] in the same vector. 6. The result follows immediately from the result which precedes (519). Since the sum of squared residuals must be at least as large, the coefficient of determination, COD = 1  sum of squares / Σi (yi  y )2, must be no larger. 7. For convenience, let F = [R(X′X)1R′]1. Then, λ = F(Rb  q) and the variance of the vector of Lagrange multipliers is Var[λ] = FRσ2(X′X)1R′F = σ2F. The estimated variance is obtained by replacing σ2 with s2. Therefore, the chisquared statistic is χ2 = (Rb  q) ′F′(s2F)1F(Rb  q) = (Rb  q) ′[(1/s2)F](Rb  q) = (Rb  q) ′[R(X′X)1R′]1(Rb  q)/[e′e/(n  K)] This is exactly J times the F statistic defined in (519) and (520). Finally, J times the F statistic in (520) equals the expression given above. 8. We use (519) to find the new sum of squares. The change in the sum of squares is e*′e*  e′e = (Rb  q) ′[R(X′X)1R′]1(Rb  q) For this problem, (Rb  q) = b2 + b3  1 = .3. The matrix inside the brackets is the sum of the 4 elements in the lower right block of (X′X)1. These are given in Exercise 1, multiplied by s2 = 20. Therefore, the required sum is [R(X′X)1R′] = (1/20)(.410 + .256  2(.051)) = .028. Then, the change in the sum of squares is .32 / .028 = 3.215. Thus, e′e = 520, e*′e* = 523.215, and the chisquared statistic is 26[523.215/520  1] = .16. This is quite small, and would not lead to rejection of the hypothesis. Note that for a single restriction, the Lagrange multiplier statistic is equal to the F statistic which equals, in turn, the square of the t statistic used to test the restriction. Thus, we could have obtained this quantity by squaring the .399 found in the first problem (apart from some rounding error). 9. First, use (519) to write e*′e* = e′e + (Rb  q)′[R(X′X)1R′]1(Rb  q). Now, the result that E[e′e] = (n K)σ2 obtained in Chapter 6 must hold here, so E[e*′e*] = (n  K)σ2 + E[(Rb  q)′[R(X′X)1R′]1(Rb  q)]. Now, b = β + (X′X)1X′ε, so Rb  q = Rβ  q + R(X′X)1X′ε. But, Rβ  q = 0, so under the hypothesis, Rb  q = R(X′X)1X′ε. Insert this in the result above to obtain E[e*′e*] = (nK)σ2 + E[ε′X(X′X)1R′[R(X′X)1R′]1R(X′X)1X′ε]. The quantity in square brackets is a scalar, so it is equal to its trace. Permute ε′X(X′X)1R′ in the trace to obtain E[e*′e*] = (n  K)σ2 + E[tr{[R(X′X)1R′]1R(X′X)1X′εε′X(X′X)1R′]} We may now carry the expectation inside the trace and use E[εε′] = σ2I to obtain E[e*′e*] = (n  K)σ2 + tr{[R(X′X)1R′]1R(X′X)1X′σ2IX(X′X)1R′]}
20
Carry the σ2 outside the trace operator, and after cancellation of the products of matrices times their inverses, we obtain E[e*′e*] = (n  K)σ2 + σ2tr[IJ] = (n  K + J)σ2. 10. Show that in the multiple regression of y on a constant, x1, and x2, while imposing the restriction β1 + β2 = 1 leads to the regression of y  x1 on a constant and x2  x1. For convenience, we put the constant term last instead of first in the parameter vector. The constraint is Rb  q = 0 where R = [1 1 0] so R1 = [1] and R2 = [1,0]. Then, β1 = [1]1[1  β2] = 1  β2. Thus, y = (1  β2)x1 + β2x2 + αi + ε or y  x1 = β2(x2  x1) + αi + ε.
Applications ?======================================================================= ? Application 5.1 Wage Equation ?======================================================================= Read;File="F:\TextRevision\edition6\SolutionsandApplications\time_var.dat"; nvar=5;nobs=17919$ ? This creates the group count variable. Regress ; Lhs = one ; Rhs = one ; Str = ID ; Panel $ ? This READ merges the smaller file into the larger one. Read;File="F:\TextRevision\edition6\SolutionsandApplications\time_invar.dat"; names=ability,med,fed,bh,sibs? ; group=_groupti ;nvar=5;nobs=2178$ Names=id,educ,lwage,pexp,t; namelist ; x1=one,educ,pexp,ability$ namelist ; x2=med,fed,bh,sibs$ ?======================================================================= ? a. Long regression ?======================================================================= regress ; lhs= lwage ; rhs = x1,x2 $ ++  Ordinary least squares regression   LHS=LWAGE Mean = 2.296821   Standard deviation = .5282364   WTS=none Number of observs. = 17919   Model size Parameters = 8   Degrees of freedom = 17911   Residuals Sum of squares = 4119.734   Standard error of e = .4795950   Fit Rsquared = .1760081   Adjusted Rsquared = .1756861   Model test F[ 7, 17911] (prob) = 546.55 (.0000)  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant .98965433 .03389449 29.198 .0000 EDUC  .07118866 .00225722 31.538 .0000 12.6760422 PEXP  .03951038 .00089858 43.970 .0000 8.36268765 ABILITY  .07736880 .00493359 15.682 .0000 .05237402 MED  .709887D04 .00169543 .042 .9666 11.4719013 FED  .00531681 .00133795 3.974 .0001 11.7092472 BH  .05286954 .00999042 5.292 .0000 .15385903 SIBS  .00487138 .00179116 2.720 .0065 3.15620291 ?======================================================================= ? b. F test ?======================================================================= Calc ; list ; fstat = Rsqrd/(kreg1)/((1rsqrd)/(nkreg)) $ ++ FSTAT = 14.025040 Calc ; r1 = rsqrd ; df1=nkreg$ Matrix ; b1 = b ; v1 = varb $ Matrix ; b1 =b1(5:8) ; v1=varb(5:8,5:8)$ Regress ; lhs = lwage ; rhs = x1 $ ++
21
 Ordinary least squares regression   LHS=LWAGE Mean = 2.296821   Standard deviation = .5282364   WTS=none Number of observs. = 17919   Model size Parameters = 4   Degrees of freedom = 17915   Residuals Sum of squares = 4132.637   Standard error of e = .4802919   Fit Rsquared = .1734272   Adjusted Rsquared = .1732888   Model test F[ 3, 17915] (prob) =1252.94 (.0000)  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant 1.02722913 .03004146 34.194 .0000 EDUC  .07376210 .00221425 33.312 .0000 12.6760422 PEXP  .03948955 .00089835 43.958 .0000 8.36268765 ABILITY  .08289072 .00459996 18.020 .0000 .05237402 ?======================================================================= ? c. F test for hypothesis that coefficients on X2 are zero ?======================================================================= Calc ; list ; fstat = (r1rsqrd)/(col(x2))/((1r1)/(df1)) $ ++ FSTAT = 14.025040 ?======================================================================= ? c. Wald test for hypothesis that coefficients on X2 are zero ?======================================================================= Matrix ; List ; Wald = b1'b1 $ Matrix WALD has 1 rows and 1 columns. 1 +1 56.10016 Note Wald = 4*F, as expected. ?======================================================================= ? Application 5.2 Translog Cost Function ?======================================================================= ? First prepare the data ? Create ; lpk=log(pk);lpl=log(pl);lpf=log(pf)$ create ; lpk2=.5*lpk^2 ; lpl2=.5*lpl^2 ; lpf2=.5*lpf^2$ Create ; lpkf=lpk*lpf ; lplf=lpl*lpf ; lpkl=lpk*lpl $ Create ; lq = log(q) ; lq2 = .5*lq^2 $ Create ; lqk=lq*lpk ; lql=lq*lpl ; lqf=lq*lpf $ Create ; lc = log(cost) $ Create ; lcpf = log(cost/pf) $ Create ; lpkpf=log(pk/pf) ; lplpf=log(pl/pf) $ Create ; lpkpf2=.5*lpkpf^2 ; lplpf2=.5*lplpf^2 ; lplfpkf=lplpf*lpkpf $ Create ; lqlpkf=lq*lpkpf ; lqlplf=lq*lplf $ ?======================================================================= ? a. Beta is a,b,dk,dl,df,pkk,pll,pff,pkl,pkf,plf,c,tqk,tql,tqf ?======================================================================= Restrictions are 0,0,1,1,1,0,0,0,0,0,0,0,0,0,0 1 0,0,0,0,0,1,0,0,1,1,0,0,0,0,0 0 R = 0,0,0,0,0,0,1,0,1,0,1,0,0,0,0 q = 0 0,0,0,0,0,0,0,1,0,1,1,0,0,0,0 0 0,0,0,0,0,0,0,0,0,0,0,0,1,1,1 0 ?======================================================================= ? b. Testing the theory ?======================================================================= Namelist ; X1=one,lq,lpk,lpl,lpf,lpk2,lpl2,lpf2,lpkl,lpkf,lplf,lq2,lqk,lq... Namelist ; X0=one,lq,lpkf,lplf,lpkpf2,lplpf2,lplfpkf,lq2,lqlpkf,lqlplf$ Regress ; lhs = lc ; rhs=x0 $ ++
22
 Ordinary least squares regression   LHS=LC Mean = 3.071619   Standard deviation = 1.542734   WTS=none Number of observs. = 158   Model size Parameters = 10   Degrees of freedom = 148   Residuals Sum of squares = 2.634416   Standard error of e = .1334170   Fit Rsquared = .9929498   Adjusted Rsquared = .9925211   Model test F[ 9, 148] (prob) =2316.03 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 1.13340208 1.04296294 1.087 .2789 LQ  .02244828 .12717485 .177 .8601 8.26548908 LPKF  .02309567 .14153592 .163 .8706 14.4192992 LPLF  .01690697 .09185395 .184 .8542 30.4387314 LPKPF2  .04730093 .21017152 .225 .8222 .42211776 LPLPF2  .03419034 .06850142 .499 .6184 15.6173009 LPLFPKF  .00741233 .11649585 .064 .9494 4.84868706 LQ2  .05544306 .00446607 12.414 .0000 35.7912728 LQLPKF  .03562155 .02862683 1.244 .2153 7.15696461 LQLPLF  .01279036 .00375187 3.409 .0008 251.570118 Calc ; ee0 = sumsqdev $ Regress ; lhs = lcpf ; rhs = x1 $ ++  Ordinary least squares regression   LHS=LCPF Mean = .3195570   Standard deviation = 1.542364   WTS=none Number of observs. = 158   Model size Parameters = 15   Degrees of freedom = 143   Residuals Sum of squares = 2.464348   Standard error of e = .1312753   Fit Rsquared = .9934018   Adjusted Rsquared = .9927558   Model test F[ 14, 143] (prob) =1537.82 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 76.2592615 38.2800363 1.992 .0483 LQ  1.08042535 .37554512 2.877 .0046 8.26548908 LPK  6.38079702 4.52920686 1.409 .1611 4.25096457 LPL  14.7182926 7.08482345 2.077 .0395 8.97279814 LPF  1.89473291 2.84231282 .667 .5061 3.39117564 LPK2  .32741427 .44070869 .743 .4587 9.05539681 LPL2  1.53852735 .69240298 2.222 .0279 40.2700121 LPF2  .07350556 .18203881 .404 .6870 5.78602018 LPKL  .57205049 .37189026 1.538 .1262 38.1346773 LPKF  .02402470 .24632928 .098 .9224 14.4192992 LPLF  .16228289 .27007181 .601 .5489 30.4387314 LQ2  .05297849 .00471336 11.240 .0000 35.7912728 LQK  .04014440 .02979137 1.348 .1799 35.1677247 LQL  .13104059 .03828401 3.423 .0008 74.2063474 LQF  .05865220 .02554928 2.296 .0232 28.0107601 Calc ; ee1 = sumsqdev $ Calc ; list ; Fstat = ((ee0  ee1)/5)/(ee1/(15815))$ ++ FSTAT = 1.973714 > Calc ; list ; ftb(.95,5,143)$ ++ Result = 2.277490 The F statistic is small; the theory is not rejected.
23
?======================================================================= ? c. Testing homotheticity ?======================================================================= ++  Ordinary least squares regression   LHS=LCPF Mean = .3195570   Standard deviation = 1.542364   WTS=none Number of observs. = 158   Model size Parameters = 10   Degrees of freedom = 148   Residuals Sum of squares = 2.634223   Standard error of e = .1334121   Fit Rsquared = .9929469   Adjusted Rsquared = .9925180   Model test F[ 9, 148] (prob) =2315.08 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 2.78239562 1.04292476 2.668 .0085 LQ  .01362521 .12717020 .107 .9148 8.26548908 LPKF  .06044098 .14153074 .427 .6700 14.4192992 LPLF  .07639000 .09185059 .832 .4069 30.4387314 LPKPF2  .10507269 .21016383 .500 .6178 .42211776 LPLPF2  .00146323 .06849891 .021 .9830 15.6173009 LPLFPKF  .01806822 .11649158 .155 .8770 4.84868706 LQ2  .05565578 .00446590 12.462 .0000 35.7912728 LQLPKF  .03824257 .02862578 1.336 .1836 7.15696461 LQLPLF  .01296202 .00375173 3.455 .0007 251.570118 Regress ; lhs = lcpf ; Rhs = x0 ; cls:b(9)=0,b(10)=0$ ++  Linearly restricted regression   Ordinary least squares regression   LHS=LCPF Mean = .3195570   Standard deviation = 1.542364   WTS=none Number of observs. = 158   Model size Parameters = 8   Degrees of freedom = 150   Residuals Sum of squares = 2.896172   Standard error of e = .1389526   Fit Rsquared = .9922456   Adjusted Rsquared = .9918837   Model test F[ 7, 150] (prob) =2741.96 (.0000)   Restrictns. F[ 2, 148] (prob) = 7.36 (.0009)   Not using OLS or no constant. Rsqd & F may be < 0.   Note, with restrictions imposed, Rsqd may be < 0.  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 6.20547247 .37175165 16.693 .0000 LQ  .40111764 .03208201 12.503 .0000 8.26548908 LPKF  .05918207 .14502101 .408 .6838 14.4192992 LPLF  .03234530 .08668866 .373 .7096 30.4387314 LPKPF2  .20340518 .21249945 .957 .3400 .42211776 LPLPF2  .00516132 .06888408 .075 .9404 15.6173009 LPLFPKF  .08684971 .10534811 .824 .4110 4.84868706 LQ2  .06103878 .00440807 13.847 .0000 35.7912728 LQLPKF  .138778D16 .517639D09 .000 1.0000 7.15696461 LQLPLF  .000000 .915064D10 .000 1.0000 251.570118 Calc ; list ; ftb(.95,2,148)$ ++ Result = 3.057197 The F statistic of 7.36 is larger than the critical value of 3.057. The hypothesis is rejected.
24
?======================================================================= ? d. Testing generalized CobbDouglas against full translog. ?======================================================================= Regress ; lhs = lcpf ; rhs = x0 ;cls:b(5)=0,b(6)=0,b(7)=0,b(9)=0,b(10)=0$ ++  Linearly restricted regression   Ordinary least squares regression   LHS=LCPF Mean = .3195570   Standard deviation = 1.542364   WTS=none Number of observs. = 158   Model size Parameters = 5   Degrees of freedom = 153   Residuals Sum of squares = 3.191949   Standard error of e = .1444383   Fit Rsquared = .9914536   Adjusted Rsquared = .9912302   Model test F[ 4, 153] (prob) =4437.33 (.0000)   Restrictns. F[ 5, 148] (prob) = 6.27 (.0000)   Not using OLS or no constant. Rsqd & F may be < 0.   Note, with restrictions imposed, Rsqd may be < 0.  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 5.07718678 .18072495 28.093 .0000 LQ  .41724916 .03285950 12.698 .0000 8.26548908 LPKF  .00903097 .01466874 .616 .5391 14.4192992 LPLF  .03131901 .00770196 4.066 .0001 30.4387314 LPKPF2  .582867D15 .127559D07 .000 1.0000 .42211776 LPLPF2  .328730D15 .986857D08 .000 1.0000 15.6173009 LPLFPKF  .461436D15 .201473D07 .000 1.0000 4.84868706 LQ2  .05956626 .00452575 13.162 .0000 35.7912728 LQLPKF  .555112D16 .538074D09 .000 1.0000 7.15696461 LQLPLF  .693889D17 .223074D09 .000 1.0000 251.570118 Calc ; list ; ftb(.95,5,148)$ ++  Listed Calculator Results  ++ Result = 2.275319 The F statistic of 6.27 is larger than the critical value of 2.275. hypothesis is rejected.
The
?======================================================================= ? e. Testing CobbDouglas against full translog. ?======================================================================= Matrix ; b2=b(5:10) ; v2=varb(5:10,5:10) $ Matrix ; list ; Fcd = 1/6 * b2'b2 $ Matrix FCD has 1 rows and 1 columns. 1 +1 28.87144 Calc ; list ; ftb(.95,6,148)$ ++  Listed Calculator Results  ++ Result = 2.160352 The F statistic of 28.871 is larger than the critical value of 2.16. The hypothesis is rejected. ?======================================================================= ? f. Testing generalized CobbDouglas against homothetic translog. ?======================================================================= Regress ; Lhs = lcpf ; rhs = one,lq,lpkf,lplf,lpkpf2,lplpf2,lplfpkf,lq2 ; cls:b(5)=0,b(6)=0,b(7)=0$ ++  Linearly restricted regression 
25
 Ordinary least squares regression   LHS=LCPF Mean = .3195570   Standard deviation = 1.542364   WTS=none Number of observs. = 158   Model size Parameters = 5   Degrees of freedom = 153   Residuals Sum of squares = 3.191949   Standard error of e = .1444383   Fit Rsquared = .9914536   Adjusted Rsquared = .9912302   Model test F[ 4, 153] (prob) =4437.33 (.0000)   Restrictns. F[ 3, 150] (prob) = 5.11 (.0022)   Not using OLS or no constant. Rsqd & F may be < 0.   Note, with restrictions imposed, Rsqd may be < 0.  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 5.07718678 .18072495 28.093 .0000 LQ  .41724916 .03285950 12.698 .0000 8.26548908 LPKF  .00903097 .01466874 .616 .5391 14.4192992 LPLF  .03131901 .00770196 4.066 .0001 30.4387314 LPKPF2  .199840D14 .243505D07 .000 1.0000 .42211776 LPLPF2  .746798D15 .608762D08 .000 1.0000 15.6173009 LPLFPKF  .140166D14 .121752D07 .000 1.0000 4.84868706 LQ2  .05956626 .00452575 13.162 .0000 35.7912728 Calc ; list ; ftb(.95,3,150) $ ++  Listed Calculator Results  ++ Result = 2.664907 ? ?======================================================================= ? g. We have not rejected the theory, but we have rejected all the ? functional forms ? except the nonhomothetic translog. Just like Christensen and Greene. ?=======================================================================
?======================================================================= ? Application 5.3 Nonlinear restrictions ?======================================================================= sample;152$ name;x=one,logpg,logi,logpnc,logpuc,logppt,t,logpd,logpn,logps$ ?======================================================================= ? a. Simple hypothesis test ?======================================================================= Regr;lhs=logg;rhs=x$ ++  Ordinary least squares regression   LHS=LOGG Mean = 1.570475   Standard deviation = .2388115   WTS=none Number of observs. = 52   Model size Parameters = 10   Degrees of freedom = 42   Residuals Sum of squares = .3812817E01   Standard error of e = .3012994E01   Fit Rsquared = .9868911   Adjusted Rsquared = .9840821   Model test F[ 9, 42] (prob) = 351.33 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 7.28719016 2.52056245 2.891 .0061 LOGPG  .06051812 .05401018 1.120 .2689 3.72930296
26
LOGI  .99299135 .25037574 3.966 .0003 9.67214751 LOGPNC  .15471632 .26696298 .580 .5653 4.38036654 LOGPUC  .48909058 .08519952 5.741 .0000 4.10544881 LOGPPT  .01926966 .13644891 .141 .8884 4.14194132 T  .03797198 .00751371 5.054 .0000 26.5000000 LOGPD  1.73205775 .25988611 6.665 .0000 4.23906603 LOGPN  .72953933 .26506853 2.752 .0087 4.23689080 LOGPS  .86798166 .35291106 2.459 .0181 4.17535768 Calc;r1=rsqrd$ Regr;lhs=logg;rhs=one,logpg,logi,logpnc,logpuc,logppt,t$ ++  Ordinary least squares regression   LHS=LOGG Mean = 1.570475   Standard deviation = .2388115   WTS=none Number of observs. = 52   Model size Parameters = 7   Degrees of freedom = 45   Residuals Sum of squares = .1014368   Standard error of e = .4747790E01   Fit Rsquared = .9651249   Adjusted Rsquared = .9604749   Model test F[ 6, 45] (prob) = 207.55 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 13.1396625 2.09171186 6.282 .0000 LOGPG  .05373342 .04251099 1.264 .2127 3.72930296 LOGI  1.64909204 .20265477 8.137 .0000 9.67214751 LOGPNC  .03199098 .20574296 .155 .8771 4.38036654 LOGPUC  .07393002 .10548982 .701 .4870 4.10544881 LOGPPT  .06153395 .12343734 .499 .6206 4.14194132 T  .01287615 .00525340 2.451 .0182 26.5000000 Calc;r0=rsqrd$ Calc;list;f=((r1r0)/2)/((1r1)/(n10))$ ++  Listed Calculator Results  ++ F = 34.868735 The critical value from the F table is 2.827, so we would reject the hypothesis. ?======================================================================= ? b. Nonlinear restriction ?=======================================================================
Since the restricted model is quite nonlinear, it would be quite cumbersome to estimate and examine the loss in fit. We can test the restriction using the unrestricted model. For this problem, f = [γnc  γuc, γncδs  γptδd] ′ The matrix of derivatives, using the order given above and " to represent the entire parameter vector, is 0 0 0 0 0 ⎤ ⎡ ∂f 1 / ∂α ⎤ ⎡0 0 0 1 − 1 ⎢ ⎥ ⎢∂f / ∂α ⎥ ⎦ = ⎣0 0 0 δ s 0 − δ d 0 − γ pt 0 γ nc ⎦ . The parameter estimates are G= ⎣ 2 Thus, f = [.17399, .10091]′. The covariance matrix to use for the tests is Gs2(X′X)1G′ The statistic for the joint test is χ2 = f′[Gs2(X′X)1G′]1f = .4772. This is less than the critical value for a chisquared with two degrees of freedom, so we would not reject the joint hypothesis. For the individual hypotheses, we need only compute the equivalent of a t ratio for each element of f. Thus, z1 = .6053 and z2 = .2898 Neither is large, so neither hypothesis would be rejected. (Given the earlier result, this was to be expected.)
27
?======================================================================= ? c. Computations for nonlinear restriction ?======================================================================= sample;152$ name;x=one,logpg,logi,logpnc,logpuc,logppt,t,logpd,logpn,logps$ Regr;lhs=logg;rhs=x$ ++  Ordinary least squares regression   LHS=LOGG Mean = 1.570475   Standard deviation = .2388115   WTS=none Number of observs. = 52   Model size Parameters = 7   Degrees of freedom = 45   Residuals Sum of squares = .1014368   Standard error of e = .4747790E01   Fit Rsquared = .9651249   Adjusted Rsquared = .9604749   Model test F[ 6, 45] (prob) = 207.55 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 13.1396625 2.09171186 6.282 .0000 LOGPG  .05373342 .04251099 1.264 .2127 3.72930296 LOGI  1.64909204 .20265477 8.137 .0000 9.67214751 LOGPNC  .03199098 .20574296 .155 .8771 4.38036654 LOGPUC  .07393002 .10548982 .701 .4870 4.10544881 LOGPPT  .06153395 .12343734 .499 .6206 4.14194132 T  .01287615 .00525340 2.451 .0182 26.5000000 Calc;r1=rsqrd$ Regr;lhs=logg;rhs=one,logpg,logi,logpnc,logpuc,logppt,t$ ++  Ordinary least squares regression   LHS=LOGG Mean = 1.570475   Standard deviation = .2388115   WTS=none Number of observs. = 52   Model size Parameters = 7   Degrees of freedom = 45   Residuals Sum of squares = .1014368   Standard error of e = .4747790E01   Fit Rsquared = .9651249   Adjusted Rsquared = .9604749   Model test F[ 6, 45] (prob) = 207.55 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 13.1396625 2.09171186 6.282 .0000 LOGPG  .05373342 .04251099 1.264 .2127 3.72930296 LOGI  1.64909204 .20265477 8.137 .0000 9.67214751 LOGPNC  .03199098 .20574296 .155 .8771 4.38036654 LOGPUC  .07393002 .10548982 .701 .4870 4.10544881 LOGPPT  .06153395 .12343734 .499 .6206 4.14194132 T  .01287615 .00525340 2.451 .0182 26.5000000 Calc;r0=rsqrd$ Calc;list;fstat=((r1r0)/2)/((1r1)/(n10))$ ++ FSTAT = 34.868735 Calc;list;ftb(.95,3,42)$ ++ Result = 2.827049 REGR;Lhs=logg;rhs=x$ Calc ; ds=b(10);dd=b(8);gpt=b(6);gnc=b(4)$ Matr;gm=[0,0,0,1,1,0,0,0,0,0 / 0,0,0,ds,0,dd,0,gpt,0,gnc]$ Calc;f1=b(4)b(6) ; f2=b(4)*b(10)b(6)*b(8)$ Matrix;list;f=[f1/f2]$
28
Matrix F
has 2 rows and 1 columns. 1 +1 .17399 2 .10091 Matrix;list;vf=gm*varb*gm'$ Matrix VF has 2 rows and 2 columns. 1 2 +1 .08263 .08059 2 .08059 .12129 Matrix;list;Wald=f'f$ Matrix WALD has 1 rows and 1 columns. 1 +1 .47716 Calc;list;z1=f(1)/sqr(vf(1,1))$ ++ Z1 = .605278 Calc;list;z2=f(2)/sqr(vf(2,2))$ ++ Z2 = .289760
29
Chapter 6 Functional Form and Structural Change Exercises 1. T he F statistic could be computed as F = {[1425  (104 + 88 + ... + 211)] / (70  16)}/[(104 + 88 + ... + 211) / (570  70)] = 1.343 The 95% critical value for the F distribution with 54 and 500 degrees of freedom is 1.363. 2. a. Using the hint, we seek the c* which is the slope on d in the regression of q = y  cd  e on y and d. The −1 −1 ⎡ y ′y y ′d ⎤ ⎡ y ′(y  cd  e) ⎤ ⎡ y ′y y ′d ⎤ ⎡ y ′y  cy ′d  y ′e ⎤ = regression coefficients are ⎢ ⎥ ⎢ ⎥ ⎢d′y d′d ⎥ ⎢d′y  cd′d  d′e ⎥ . In the preceding, ⎣d′y d′d ⎦ ⎣d′(y  cd  e) ⎦ ⎣ ⎦ ⎣ ⎦ note that (y′y,d′y)′ is the first column of the matrix being inverted while c(y′d,d′d)′ is c times the second. An inverse matrix times the first column of the original matrix is the first column of an identity matrix, and likewise for the second. Also, since d was one of the original regressors in (1), d′e = 0, and, of course, y′e = e′e. If we combine all of these, the coefficient vector is −1 −1 ⎛ 1 ⎞ ⎛ 0 ⎞ ⎡ y ′y y ′d ⎤ ⎛ e′e ⎞ ⎛ 1 ⎞ ⎛ 0 ⎞ ⎡ y ′y y ′d ⎤ ⎛ 1 ⎞ −⎜ ⎟ − c⎜ ⎟ − ⎢ ⎥ ⎜ ⎟ = − ⎜ 0 ⎟ − c ⎜ 1 ⎟ − ⎢d′y d′d ⎥ ⎜ 0 ⎟ e′e . We are interested in the second ⎝ 0 ⎠ ⎝ 1 ⎠ ⎣d′y d′d ⎦ ⎝ 0 ⎠ ⎝ ⎠ ⎝ ⎠ ⎣ ⎦ ⎝ ⎠ (lower) of the two coefficients. The matrix product at the end is e′e times the first column of the inverse matrix, and we wish to find its second (bottom) element. Therefore, collecting what we have thus far, the desired coefficient is c* = c  e′e times the off diagonal element in the inverse matrix. The off diagonal element is d′y / [(y′y)(d′d)  (y′d)2] = d′y / {[(y′y)(d′d)][1  (y′d)2/[(y′y)(d′d)]]} 2 = d′y / [(y′y)(d′d)(1  ryd )]. Therefore,
c*
2 = [(e′e)(d′y)] / [(y′y)(d′d)(1  ryd )]  c
(The two negative signs cancel.) This can be further reduced. Since all variables are in deviation form, e′e/y′y is (1  R2) in the full regression. By multiplying it out, you can show that d = P so that d′d = Σi (di  P)2 = nP(1P) and d′y = Σi (di  P)(yi  y ) = Σi(di  P)yi = n1( y 1  y ) where n1 is the number of observations which have di = 1. Combining terms once again, we have 2 c* = {[n1( y 1  y )(1  R2)} / {nP(1P)(1  ryd )}  c Finally, since P = n1/n, this further simplifies to the result claimed in the problem, 2 c* = {( y 1  y )(1  R2)} / {(1P)(1  ryd )}  c The problem this creates for the theory is that in the present setting, if, indeed, c is negative, ( y 1  y ) will almost surely be also. Therefore, the sign of c* is ambiguous.
30
⎛ x *⎞ ⎛ y ⎞ ⎛ α ⎞ ⎡β 1 0 ⎤ ⎜ ⎟ 3. We first find the joint distribution of the observed variables. ⎜ ⎟ = ⎜ ⎟ + ⎢ ⎥ ⎜ ε ⎟ so [y,x] have a ⎝ x ⎠ ⎝ 0 ⎠ ⎣1 0 1 ⎦ ⎜ u ⎟ ⎝ ⎠ ⎛ μ *⎞ ⎛ y ⎞ ⎛ α ⎞ ⎡β 1 0 ⎤ ⎜ ⎟ ⎛ α + βμ * ⎞ joint normal distribution with mean vector E ⎜ ⎟ = ⎜ ⎟ + ⎢ ⎟ and covariance ⎥⎜ 0 ⎟ = ⎜ ⎝ x ⎠ ⎝ 0 ⎠ ⎣1 0 1 ⎦ ⎜ 0 ⎟ ⎝ μ ∗ ⎠ ⎝ ⎠ 2 ⎡σ* 0 0 ⎤ ⎡β 1 ⎤ 2 2 2 βσ*2 ⎤ ⎛ y ⎞ ⎡β 1 0 ⎤ ⎢ ⎥⎢ ⎥ = ⎡β σ* + σε 2 matrix Var ⎜ ⎟ = ⎢ σ 1 0 0 0 ⎢ ⎥ , The probability limit of the ε ⎥ ⎢ ⎥⎢ 2 ⎥ σ*2 + σu2 ⎦ ⎝ x ⎠ ⎣1 0 1 ⎦ ⎢ 0 0 σ 2 ⎥ ⎢ 0 1 ⎥ ⎣ βσ* u⎦⎣ ⎦ ⎣ slope in the linear regression of y on x is, as usual, plim b = Cov[y,x]/Var[x] = β/(1 + σu2/σ*2) < β. The probability limit of the intercept is plim a = E[y]  (plim b)E[x] = α + βμ*  βμ*/(1 + σu2/σ*2) = α + β[μ*σu / (σ*2 + σu2)] > α (assuming β > 0). If x is regressed on y instead, the slope will estimate plim[b′] = Cov[y,x]/Var[y] = βσ*2/(β2σ*2 + σε2). Then,plim[1/b′] = β + σε2/β2σ*2 > β. Therefore, b and b′ will bracket the true parameter (at least in their probability limits). Unfortunately, without more information about σu2, we have no idea how wide this bracket is. Of course, if the sample is large and the estimated bracket is narrow, the results will be strongly suggestive.
4. In the regression of y on x and d, if d and x are independent, we can invoke the familiar result for least squares regression. The results are the same as those obtained by two simple regressions. It is instructive to −1 −1 2 2 ⎡ x′x/n x′d/n ⎤ ⎛ x′y/n ⎞ ⎡σ∗2 + σu2 0 ⎤ ⎛ βσ*2 ⎞ ⎛ β/ (1+ σu / σ∗ ) ⎞ =⎢ verify this. plim ⎢ ⎟ . Therefore, although ⎜ ⎟ = ⎜⎜ ⎥ ⎜ ⎟ ⎥ ⎟ π ⎦ ⎝ γπ ⎠ ⎝ γ ⎣d′x/n d′d/n ⎦ ⎝ d′y / n ⎠ ⎣ 0 ⎠ the coefficient on x is distorted, the effect of interest, namely, γ, is correctly measured. Now consider what happens if x* and d are not independent. With the second assumption, we must replace the off diagonal zero above with plim(x′d/n). Since u and d are still uncorrelated, this equals Cov[x*,d]. This is Cov[x*,d] = E[x*d] = πE[x*dd=1] + (1π)E[x*dd=0] = πμ1. Also, plim[y′d/n] is now βCov[x*,d] + γplim(d′d/n) = βπμ1 + γπ and plim[y′x*/n] equals βplim[x*′x*/n] + γplim[x*′d/n] = βσ*2 + γπμ1. Then, the probability limits of the least squares coefficient estimators is −1 −1 2 2 ⎡ x′x/n x′d/n ⎤ ⎛ x′y/n ⎞ ⎡σ∗2 + σu2 πμ1 ⎤ ⎛ βσ*2 + γπμ1 ⎞ ⎛ β/ (1+ σu / σ∗ ) ⎞ plim ⎢ = = ⎜ ⎟ ⎜ ⎟ ⎥ ⎟ ⎢ ⎥ ⎜ 1 ⎟ π ⎦ ⎝ βπμ1 + γπ ⎠ ⎜⎝ γ ⎣d′x/n d′d/n ⎦ ⎝ d′y / n ⎠ ⎣ πμ ⎠ ⎡ π −πμ1 ⎤ ⎛ βσ *2 + γπμ1 ⎞ 1 ⎟ = ⎢ ⎥ ⎜ π(σ*2 + σu2 ) + π2 (μ1 ) 2 ⎣ −πμ1 σ*2 + σu2 ⎦ ⎝ βπμ1 + γπ ⎠ ⎛ ⎞ β(πσ*2 + π2 (μ1 ) 2 ) . ⎜ 2 2 2 1 2 2 ⎟ ⎝ γ(π(σ* + σu ) + π (μ ) ) + βπσu ⎠ The second expression does reduce to plim c = γ + βπμ1σu2/[π(σ*2 + σu2)  π2(μ1)2], but the upshot is that in the presence of measurement error, the two estimators become an unredeemable hash of the underlying parameters. Note that both expressions reduce to the true parameters if σu2 equals zero. Finally, the two means are estimators of E[yd=1] = βE[x*d=1] + γ = βμ1 + γ = βμ0, and E[yd=0] = βE[x*d=0] 1 0 so the difference is β(μ  μ ) + γ, which is a mixture of two effects. Which one will be larger is entirely indeterminate, so it is reasonable to conclude that this is not a good way to analyze the problem. If γ equals zero, this difference will merely reflect the differences in the values of x*, which may be entirely unrelated to the issue under examination here. (This is, unfortunately, what is usually reported in the popular press.)
=
1 2 2 π(σ* + σu ) + π2 (μ1 ) 2
31
Applications ?======================================================================= ? Application 6.1 ?======================================================================= a. Wage equation Namelist ; X = one,educ,ability,pexp,med,fed,bh,sibs$ Regress ; Lhs = lwage ; Rhs = x $ Calc ; xb = b(1)+b(2)*12+b(3)*xbr(ability)+b(4)*xbr(med) +b(5)*xbr(fed)+b(6)*0+b(7)*xbr(sibs) $ Calc ; list ; mv = exp(xb) * b(2) $ ++  Ordinary least squares regression   LHS=LWAGE Mean = 2.296821   Standard deviation = .5282364   WTS=none Number of observs. = 17919   Model size Parameters = 7   Degrees of freedom = 17912   Residuals Sum of squares = 4126.175   Standard error of e = .4799564   Fit Rsquared = .1747197   Adjusted Rsquared = .1744433   Model test F[ 6, 17912] (prob) = 632.02 (.0000)  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant .96950956 .03370543 28.764 .0000 EDUC  .07220350 .00225076 32.080 .0000 12.6760422 ABILITY  .07746781 .00493727 15.690 .0000 .05237402 PEXP  .03950928 .00089926 43.936 .0000 8.36268765 MED  .00011702 .00169634 .069 .9450 11.4719013 FED  .00545695 .00133870 4.076 .0000 11.7092472 SIBS  .00476557 .00179240 2.659 .0078 3.15620291 ++  Listed Calculator Results  ++ MV = .725176b. Step function ?======================================================================= ? b. ?======================================================================= Histogram ; Rhs = Educ $
32
Create ; HS = Educ <= 12 $ Create ; Col = (Educ>12) * (educ <=16) $ Create ; Grad = Educ > 16 $ Regress ; Lhs=lwage ; Rhs = one,Col,Grad,ability,pexp,med,fed,bh,sibs $ ++  Ordinary least squares regression   LHS=LWAGE Mean = 2.296821   Standard deviation = .5282364   WTS=none Number of observs. = 17919   Model size Parameters = 9   Degrees of freedom = 17910   Residuals Sum of squares = 4215.033   Standard error of e = .4851239   Fit Rsquared = .1569472   Adjusted Rsquared = .1565706   Model test F[ 8, 17910] (prob) = 416.78 (.0000)  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant 1.81124933 .02069456 87.523 .0000 COL  .17467913 .00872506 20.020 .0000 .32183716 GRAD  .36244740 .02086328 17.373 .0000 .03493499 ABILITY  .10097636 .00486713 20.747 .0000 .05237402 PEXP  .03814088 .00090643 42.078 .0000 8.36268765 MED  .00081934 .00171488 .478 .6328 11.4719013 FED  .00700641 .00135096 5.186 .0000 11.7092472 BH  .06962521 .01007870 6.908 .0000 .15385903 SIBS  .00371191 .00181156 2.049 .0405 3.15620291 c. Education squared Create ; educsq = educ*educ $ Regress ; Lhs = lwage;rhs=one,educ,educsq,ability,pexp,med,fed,bh,sibs$ ++  Ordinary least squares regression   LHS=LWAGE Mean = 2.296821   Standard deviation = .5282364   WTS=none Number of observs. = 17919   Model size Parameters = 9   Degrees of freedom = 17910   Residuals Sum of squares = 4114.269   Standard error of e = .4792902   Fit Rsquared = .1771010   Adjusted Rsquared = .1767334   Model test F[ 8, 17910] (prob) = 481.81 (.0000)  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant .42778242 .12008093 3.562 .0004 EDUC  .15590624 .01751608 8.901 .0000 12.6760422 EDUCSQ  .00313261 .00064230 4.877 .0000 164.377588 ABILITY  .07433494 .00496954 14.958 .0000 .05237402 PEXP  .03962214 .00089830 44.108 .0000 8.36268765 MED  .00030520 .00169504 .180 .8571 11.4719013 FED  .00519423 .00133734 3.884 .0001 11.7092472 BH  .04957434 .01000691 4.954 .0000 .15385903 SIBS  .00499325 .00179020 2.789 .0053 3.15620291 Namelist ; x1 = one,educ,educsq,ability,pexp,med,fed,bh,sibs $ Matrix ; means = mean(x1)$ Matrix ; means(2)=0 $ Matrix ; means(3)=0$ Calc ; a=means'b $ Calc ; b2=b(2) ; b3=b(3) $ Sample ; 1 $
33
Fplot ; fcn = a + b2*schoolng + b3*schoolgn^2 ; pts=100 ; start = 12 ; limits = 1,20 ; labels=schoolng ; plot(schoolng) $
d. Interaction. Sample ; All $ Create ; EA = Educ*ability $ Regress ; Lhs = lwage;rhs=one,educ,ability,ea,pexp,med,fed,bh,sibs$ Calc ; abar =xbr(ability) $ Calc ; list ; me = b(2)+b(4)*abar $ Calc ; sdme = sqr(varb(2,2)+abar^2*varb(4,4) + 2*abar*varb(2,4))$ Calc ; list ; lower = me  1.96*sdme ; upper = me + 1.96*sdme $ ++  Ordinary least squares regression   LHS=LWAGE Mean = 2.296821   Standard deviation = .5282364   WTS=none Number of observs. = 17919   Model size Parameters = 9   Degrees of freedom = 17910   Residuals Sum of squares = 4119.377   Standard error of e = .4795877   Fit Rsquared = .1760794   Adjusted Rsquared = .1757113   Model test F[ 8, 17910] (prob) = 478.44 (.0000)  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant 1.00190276 .03529335 28.388 .0000 EDUC  .07006221 .00243183 28.811 .0000 12.6760422 ABILITY  .04693108 .02494471 1.881 .0599 .05237402 EA  .00253975 .00204029 1.245 .2132 1.60372621 PEXP  .03947437 .00089903 43.908 .0000 8.36268765 MED  .542277D04 .00169546 .032 .9745 11.4719013 FED  .00534599 .00133813 3.995 .0001 11.7092472 BH  .05314420 .00999271 5.318 .0000 .15385903 SIBS  .00479076 .00179231 2.673 .0075 3.15620291 ++  Listed Calculator Results  ++ ME = .070195 LOWER = .065503 UPPER = .074888
34
e. Regress ; Lhs = lwage;rhs=one,educ,educsq,ability,ea,pexp,med,fed,bh,sibs$ ++  Ordinary least squares regression   LHS=LWAGE Mean = 2.296821   Standard deviation = .5282364   WTS=none Number of observs. = 17919   Model size Parameters = 10   Degrees of freedom = 17909   Residuals Sum of squares = 4106.031   Standard error of e = .4788235   Fit Rsquared = .1787487   Adjusted Rsquared = .1783360   Model test F[ 9, 17909] (prob) = 433.11 (.0000)  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant .10514525 .14931731 .704 .4813 EDUC  .24088793 .02252126 10.696 .0000 12.6760422 EDUCSQ  .00654261 .00085754 7.630 .0000 164.377588 ABILITY  .12453442 .03354596 3.712 .0002 .05237402 EA  .01631824 .00272231 5.994 .0000 1.60372621 PEXP  .03951247 .00089761 44.020 .0000 8.36268765 MED  .00045246 .00169356 .267 .7893 11.4719013 FED  .00524829 .00133606 3.928 .0001 11.7092472 BH  .04775208 .01000179 4.774 .0000 .15385903 SIBS  .00460796 .00178961 2.575 .0100 3.15620291 ++  Listed Calculator Results  ++ AVGLOW = .798563 AVGHIGH = .717891 Create ; lowa = ability < xbr(ability) ; higha = 1  lowa $ Calc ; list ; avglow= lowa'ability / lowa'lowa ; avghigh=higha'ability/higha'higha $ Calc ; a = b(1) + b(6)*xbr(pexp)+b(7)*xbr(med)+ b(8)*xbr(fed)+b(9)*xbr(bh)+b(10)*xbr(sibs)$ Calc ; al=a+b(4)*avglow ; ah = a+b(4)*avghigh$ Samp;1120$ Create ; school = trn(9,.1)$ Create ; lwlow = al + b(2)*school+b(3)*school^2 + b(5)*avglow*school $ Create ; lwhigh = ah + b(2)*school+b(3)*school^2 + b(5)*avghigh*school $ Plot ; lhs = school ; rhs =lwhigh,lwlow ;fill ;grid ;Title=Comparison of logWage Profiles for Low and High Ability$
35
?======================================================================= ? Application 6.2 ?======================================================================= Sample ; All $ Namelist ; X = one,educ,ability,pexp,med,fed,sibs$ Regress ; For [bh=0] ; Lhs = lwage ; Rhs = x $ Calc ; ee0=sumsqdev $ Matrix ; b0=b ; v0=varb $ Regress ; For [bh=1] ; Lhs = lwage ; Rhs = x $ Calc ; ee1=sumsqdev $ Matrix ; b1=b ; v1=varb $ Regress ; Lhs = lwage ; Rhs = x $ Calc ; ee=sumsqdev $ Calc ; list ; chow = ((eeee0ee1)/col(x))/ ((ee0+ee1)/(n2*col(x))) $ ++  Listed Calculator Results  ++ CHOW = 7.348379 Matrix ; db=b0b1 ; vdb=v0+v1 $ Matrix ; list ; Wald = db'db $ Matrix WALD has 1 rows and 1 columns. 1 +1 50.57114
36
?======================================================================= ? Application 6.3 ?=======================================================================
a. The least squares estimates of the four models are q/A = .45237 + .23815lnk q/A = .91967  .61863/k ln(q/A) = .72274 + .35160lnk ln(q/A) = .032194  .91496/k At these parameter values, the four functions are nearly identical. A plot of the four sets of predictions from the regressions and the actual values appears below.
b. The scatter diagram is shown below. The last seven years of the data set show clearly the effect observed by Solow.
.
37
c. The regression results for the various models are listed below. (d is the dummy variable equal to 1 for the last seven years of the data set. Standard errors for parameter estimates are given in parentheses.) e′e α β γ δ R2 Model 1:q/A = α + βlnk + γd + δ(dlnk) + ε .4524 .2381 .94355 (.00903) (.00932) .4477 .2396 .01900 .99914 (.00113) (.00117) (.000384) .4476 .2397 .02746 .08883 .99915 (.00115) (.00118) (.0119) (.0126) Model 2: q/A = α  β(1/k) + γd + δ(d/k) + ε .9168 .6186 .94915 (.00891) (.0229) .9167 .6185 .01961 .99321 (.00331) (.00849) (.00108) .9168 .6187 .008651 .02140 .99322 (.00336) (.00863) (.0354) (.0917) Model 3: ln(q/A) = α + βlnk + γd + δ(dlnk) + ε .7227 .3516 .94069 (.0137) (.0141) .7298 .3538 .002881 .99918 (.00164) (.00169) (.000554) .7300 .3540 .04961 .02182 .99921 (.00164) (.00148) (.0171) (.0179) Model 4: ln(q/A) = α  β(1/k) + γd + δ(d/k) + ε .03219 .9150 .94964 (.0131) (.0337) .03665 .9148 .02572 .99629 (.00361) (.00928) (.00118) .03646 .9153 .004290 .05556 .99632 (.00366) (.00941) (.0386) (.0999)
.00213 .000032 .000032
.001915 .000256 .000255
.004882 .000068 .000065
.004146 .000305 .000303
d. For the four models, the F test of the third specification against the first is equivalent to the Chowtest. The statistics are: Model 1: F = (.002126  .000032)/2 / (.000032/37) = 1210.6 Model 2: F = = 120.43 Model 3: F = = 1371.0 Model 4: F = = 234.64 The critical value from the F table for 2 and 37 degrees of freedom is 3.26, so all of these are statistically significant. The hypothesis that the same model applies in both subperiods must be rejected.
38
?======================================================================= ? Application 6.4 ?=======================================================================
According to the full model, the expected number of incidents for a ship of the base type A built in the base period 1960 to 1964, is 3.4. The other 19 predicted values follow from the previous results and are left as an exercise. The relevant test statistics for differences across ship type and year are as follows: type : F[4,12] =
(3925.2  660.9)/4
= 14.82,
660.9/12 (1090.3  660.9)/3
= 2.60. 660.9/12 The 5 percent critical values from the F table with these degrees of freedom are 3.26 and 3.49, respectively, so we would conclude that the average number of incidents varies significantly across ship types but not across years. year : F[3,12] =
Regression Coefficients Full Model Constant 3.4 B 27.75 C –7.0 D –4.5 E –3.25 65–69 7.0 70–74 11.4 75–79 1.0 R2 0.84823 660.9 e′e
Time Effects 6.0 0 0 0 0 7.0 11.4 1.0 0.0986 3925.2
Type Effects 8.25 27.75 –7.0 –4.5 –3.25 0 0 0 0.74963 1090.2
No Effects 10.85 0 0 0 0 0 0 0 0 4354.5
39
Chapter 7 Specification Analysis and Model Selection Exercises 1. The result cited is E[b1] = β1 + P1.2β2 where P1.2 = (X1′X1)1X1′X2, so the coefficient estimator is biased. If the conditional mean function E[X2X1] is a linear function of X1, then the sample estimator P1.2 actually is an unbiased estimator of the slopes of that function. (That result is Theorem B.3, equation (B68), in another form). Now, write the model in the form y = X1β1 + E[X2X1]β2 + ε + (X2  E[X2X1])β2
So, when we regress y on X1 alone and compute the predictions, we are computing an estimator of X1(β1 + P1.2β2) = X1β1 + E[X2X1]β2. Both parts of the compound disturbance in this regression ε and (X2  E[X2X1])β2 have mean zero and are uncorrelated with X1 and E[X2X1], so the prediction error has mean zero. The implication is that the forecast is unbiased. Note that this is not true if E[X2X1] is nonlinear, since P1.2 does not estimate the slopes of the conditional mean in that instance. The generality is that leaving out variables wil bias the coefficients, but need not bias the forecasts. It depends on the relationship between the conditional mean function E[X2X1] and X1P1.2. 2. The “long” estimator, b1.2 is unbiased, so its mean squared error equals its variance, σ2(X1′M2X1)1 The short estimator, b1 is biased; E[b1] = β1 + P1.2β2. It’s variance is σ2(X1′X1)1. It’s easy to show that this latter variance is smaller. You can do that by comparing the inverses of the two matrices. The inverse of the first matrix equals the inverse of the second one minus a positive definite matrix, which makes the inverse smaller hence the original matrix is larger  Var[b1.2] > Var[b1]. But, since b1 is biased, the variance is not its mean squared error. The mean squared error of b1 is Var[b1] + bias×bias′. The second term is P1.2β2β2′P1.2′. When this is added to the variance, the sum may be larger or smaller than Var[b1.2]; it depends on the data and on the parameters, β2. The important point is that the mean squared error of the biased estimator may be smaller than that of the unbiased estimator. 3. The log likelihood function at the maximum is = n/2[1 + ln2π + ln(e′e/n)] lnL = n/2{1 + ln2π + ln[nSyy(1 – R2)]} = n/2{1 + ln2π + ln(nSyy) + ln(1R2)} where Syy = Σin=1 ( yi − y ) 2 since R2 = 1  e′e/Syy . The derivative of this expression is ∂lnL/∂R2 = (n/2){1/(1R2)}(1) which is always positive. Therefore, the log likelihood increases when R2 increases. 4. An inconvenient way to obtain the result is by repeated substitution of Ct1, then Ct2 and so on. It is much easier and faster to introduce the lag operator used in Chapter 20. Thus, the alternative model is Ct = γ1 + γ2Yt + γ3LCt + ε1t where LCt = Ct1. Then, (1 – γ3L)Ct = γ1 + γ2Yt + ε1t. Now, multiply both sides of the equation by 1/(1γ3L) = 1 + γ3L + γ32L2 + … to obtain Ct = γ1/(1  γ3) + γ2Yt + γ2γ3Yt1 + Σ ∞s = 2 γ2γ3sYts + Σ ∞s = 0 γ3sεts.
40
Application The J test in Example is carried out using over 50 years of data. It is optimistic to hope that the underlying structure of the economy did not change in 50 years. Does the result of the test carried out in Example 8.2 persist if it is based on data only from 1980 to 2000? Repeat the computation with this subset of the data. ?==================================== ? Example 7.2 and Application 7.1 ?==================================== Dates ; 1950.1 $ Period ; 1950.1  2000.4 $ Create ; Ct = Realcons ; Yt = RealDPI $ Create ; Ct1 = Ct[1] ; Yt1 = Yt[1] $ ? Example 7.2 Period ; 1950.2  2000.4 $ Regress; Lhs = Ct ; Rhs = one,Yt,Yt1 ; Keep = CY $ Regress; Lhs = Ct ; Rhs = one,Yt,Ct1 ; Keep = CC $ Regress; Lhs = Ct ; Rhs = one,Yt,Yt1,CC $ ++  Ordinary least squares regression   Model was estimated May 12, 2007 at 08:56:19AM   LHS=CT Mean = 3008.995   Standard deviation = 1456.900   WTS=none Number of observs. = 203   Model size Parameters = 4   Degrees of freedom = 199   Residuals Sum of squares = 73550.21   Standard error of e = 19.22496   Fit Rsquared = .9998285   Adjusted Rsquared = .9998259   Model test F[ 3, 199] (prob) =******* (.0000)   Diagnostic Log likelihood = 886.1351   Restricted(b=0) = 1766.209   Chisq [ 3] (prob) =1760.15 (.0000)   Info criter. LogAmemiya Prd. Crt. = 5.931932   Akaike Info. Criter. = 5.931926   Autocorrel DurbinWatson Stat. = 2.0256102   Rho = cor[e,e(1)] = .0128051  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant .60444607 3.43245774 .176 .8604 YT  .31456542 .04619552 6.809 .0000 3352.09360 YT1  .33004915 .04591940 7.188 .0000 3325.25222 CC  1.01450597 .01613899 62.861 .0000 3008.99507 Regress; Lhs = Ct ; Rhs = one,Yt,Ct1,CY $ ++  Ordinary least squares regression   Model was estimated May 12, 2007 at 08:56:19AM   LHS=CT Mean = 3008.995   Standard deviation = 1456.900   WTS=none Number of observs. = 203   Model size Parameters = 4   Degrees of freedom = 199   Residuals Sum of squares = 73550.21   Standard error of e = 19.22496   Fit Rsquared = .9998285   Adjusted Rsquared = .9998259   Model test F[ 3, 199] (prob) =******* (.0000)   Diagnostic Log likelihood = 886.1351   Restricted(b=0) = 1766.209   Chisq [ 3] (prob) =1760.15 (.0000)   Info criter. LogAmemiya Prd. Crt. = 5.931932   Akaike Info. Criter. = 5.931926 
41
 Autocorrel DurbinWatson Stat. = 2.0256102   Rho = cor[e,e(1)] = .0128051  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 865.712368 120.569071 7.180 .0000 YT  9.82505250 1.36759557 7.184 .0000 3352.09360 CT1  1.02780685 .01635059 62.861 .0000 2982.97438 CY  10.6765577 1.48541853 7.188 .0000 3008.99507 ? Application 7.1. We use only the 1980 data, so we ? start in quarter 2 of 1980 even though data are ? available for the last quarter of 1979. Period ; 1980.2  2000.4 $ Regress; Lhs = Ct ; Rhs = one,Yt,Yt1 ; Keep = CY $ Regress; Lhs = Ct ; Rhs = one,Yt,Ct1 ; Keep = CC $ Regress; Lhs = Ct ; Rhs = one,Yt,Yt1,CC $ ++  Ordinary least squares regression   Model was estimated May 12, 2007 at 08:58:19AM   LHS=CT Mean = 4503.230   Standard deviation = 879.3593   WTS=none Number of observs. = 83   Model size Parameters = 4   Degrees of freedom = 79   Residuals Sum of squares = 43603.43   Standard error of e = 23.49345   Fit Rsquared = .9993123   Adjusted Rsquared = .9992862   Model test F[ 3, 79] (prob) =******* (.0000)   Diagnostic Log likelihood = 377.7300   Restricted(b=0) = 679.9419   Chisq [ 3] (prob) = 604.42 (.0000)   Info criter. LogAmemiya Prd. Crt. = 6.360511   Akaike Info. Criter. = 6.360436   Autocorrel DurbinWatson Stat. = 1.8153241   Rho = cor[e,e(1)] = .0923379  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 39.6958824 37.1402619 1.069 .2884 YT  .20222923 .07364203 2.746 .0075 4987.32410 YT1  .25661196 .07221392 3.553 .0006 4951.70482 CC  1.04938412 .04670690 22.467 .0000 4503.23012 Regress; Lhs = Ct ; Rhs = one,Yt,Ct1,CY $ ++  Ordinary least squares regression   Model was estimated May 12, 2007 at 08:58:19AM   LHS=CT Mean = 4503.230   Standard deviation = 879.3593   WTS=none Number of observs. = 83   Model size Parameters = 4   Degrees of freedom = 79   Residuals Sum of squares = 43603.43   Standard error of e = 23.49345   Fit Rsquared = .9993123   Adjusted Rsquared = .9992862   Model test F[ 3, 79] (prob) =******* (.0000)   Diagnostic Log likelihood = 377.7300   Restricted(b=0) = 679.9419   Chisq [ 3] (prob) = 604.42 (.0000)   Info criter. LogAmemiya Prd. Crt. = 6.360511   Akaike Info. Criter. = 6.360436   Autocorrel DurbinWatson Stat. = 1.8153241   Rho = cor[e,e(1)] = .0923379  ++
42
+++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 856.107861 221.141722 3.871 .0002 YT  1.21490273 .32340906 3.757 .0003 4987.32410 CT1  .98759074 .04395654 22.467 .0000 4465.65542 CY  1.13474451 .31933175 3.553 .0006 4503.23012 ? ? The results are essentially the same. This suggests ? that neither model is right.
The regressions are based on real consumption and real disposable income. Results for 1950 to 2000 are given in the text. Repeating the exercise for 1980 to 2000 produces: for the first regression, the estimate of α is 1.03 with a t ratio of 23.27 and for the second, the estimate is 1.24 with a t ratio of 3.062. Thus, as before, both models are rejected. This is qualitatively the same results obtained with the full 51 year data set.
43
Chapter 8 The Generalized Regression Model and Heteroscedasticity Exercises 1. Write the two estimators as βˆ = β + (X′Ω1X)1X′Ω1ε and b = β + (X′X)1X′ε. Then, ( βˆ  b) = [(X′Ω1X)1X′Ω1  (X′X)1X′]ε has E[ βˆ  b] = 0 since both estimators are unbiased. Therefore, = E[( βˆ  β)( βˆ  b)′]. Cov[ βˆ , βˆ  b] Then, E{(X′Ω1X)1X′Ω1εε′[(X′Ω1X)1X′Ω1  (X′X)1X′]′} = (X′Ω1X)1X′Ω1(σ2Ω)[Ω1X(X′Ω1X)1  X(X′X)1] = σ2(X′Ω1X)1X′Ω1ΩΩ1X(X′Ω1X)1  (X′Ω1X)1X′Ω1ΩX(X′X)1 = (X′Ω1X)1(X′Ω1X)(X′Ω1X)1  (X′Ω1X)1(X′X)(X′X)1 = 0 once the inverse matrices are multiplied.
2 First,
(R βˆ  q) = R[β + (X′Ω1X)1X′Ω1ε)]  q = R(X′Ω1X)1X′Ω1ε if Rβ  q = 0.
Now, use the inverse square root matrix of Ω, P = Ω1/2 to obtain the transformed data, X* = PX = Ω1/2X, y* = Py = Ω1/2y, and ε* = Pε = Ω1/2ε. E[ε*ε*′] = E[Ω1/2εε′Ω2] = Ω1/2(σ2Ω)Ω1/2 = σ2I, Then, and, βˆ = (X′Ω1X)1X′Ω1y = (X*′X*)1X*′y* = the OLS estimator in the regression of y* on X*. Then, R βˆ  q = R(X*′X*)1X*′ε* and the numerator is ε*′X*(X*′X*)1R′[R(X*′X*)1R′]1R(X*′X*)1X*′ε* / J. By multiplying it out, we find that the matrix of the quadratic form above is idempotent. Therefore, this is an idempotent quadratic form in a normally distributed random vector. Thus, its distribution is that of σ2 times a chisquared variable with degrees of freedom equal to the rank of the matrix. To find the rank of the matrix of the quadratic form, we can find its trace. That is tr{X*(X*′X*)1R′[R(X*′X*)1R′]1R(X*′X*)1X*} = tr{(X*′X*)1R′[R(X*′X*)1R′]1R(X*′X*)1X*′X*} = tr{(X*′X*)1R′[R(X*′X*)1R′]1R} = tr{[R(X*′X*)1R′][R(X*′X*)1R′]1} = tr{IJ} = J, which might have been expected. Before proceeding, we should note, we could have deduced this outcome from the form of the matrix. The matrix of the quadratic form is of the form Q = X*ABA′X*′ where B is the nonsingular matrix in the square brackets and A = (X*′X*)1R′, which is a K×J matrix which cannot have rank higher than J. Therefore, the entire product cannot have rank higher than J. Continuing, we now find that the numerator (apart from the scale factor, σ2) is the ratio of a chisquared[J] variable to its degrees of freedom. We now turn to the denominator. By multiplying it out, we find that the denominator is * *ˆ (y  X β )′(y*  X* βˆ )/(n  K). This is exactly the sum of squared residuals in the least squares regression of y* on X*. Since y* = X*β + ε* and βˆ = (X*′X*)1X*′y* the denominator is ε*′M*ε*/(n  K), the familiar form of the sum of squares. Once again, this is an idempotent quadratic form in a normal vector (and, again, apart
44
from the scale factor, σ2, which now cancels). The rank of the M matrix is n  K, as always, so the denominator is also a chisquared variable divided by its degrees of freedom. It remains only to show that the two chisquared variables are independent. We know they are if the two matrices are orthogonal. They are since M*X* = 0. This completes the proof, since all of the requirements for the F distribution have been shown. 3. First, we know that the denominator of the F statistic converges to σ2. Therefore, the limiting distribution of the F statistic is the same as the limiting distribution of the statistic which results when the denominator is replaced by σ2. It is useful to write this modified statistic as W * = (1/σ2)(R βˆ  q)′[R(X*′X*)1R′]1(R βˆ  q)/J. Now, incorporate the results from the previous problem to write this as W * = ε*′X*(X*′X*)1R′[Rσ2(X*′X*)1R′]1R(X*′X*)1X*′ε/J ε0 = R(X*′X*)1X*′ε*. Let Note that this is a J×1 vector. By multiplying it out, we find that E[ε0ε0′] = Var[ε0] = R{σ2(X*′X*)1}R′. Therefore, the modified statistic can be written as W * = ε0′Var[ε0]1ε0/J. This is the ‘full rank quadratic form’ discussed in Appendix B. For convenience, let C = Var[ε0], T = C1/2, and v = Tε0. Then, W * = v′v. By construction, v = Var[ε0]1/2ε0, so E[v] = 0 and Var[v] = I. The limiting distribution of v′v is chisquared J if the limiting distribution of v is standard normal. All of the conditions for the central limit theorem apply to v, so we do have the result we need. This implies that as long as the data are well behaved, the numerator of the F statistic will converge to the ratio of a chisquared variable to its degrees of freedom. ˆ ′X ˆ = (1/n)X′ Ω ˆ 1X is the 4. The development is unchanged. As long as the limiting behavior of (1/n) X same as that of (1/n)X*′X*, the limiting distribution of the test statistic will be the same as if the true Ω ˆ . were used instead of the estimate Ω
5. First, in order to simplify the algebra somewhat without losing any generality, we will scale the columns of X so that for each xk, xk′xk = 1. We do this by beginning with our original data matrix, say, X0 and obtaining X as X = X0D1/2, where D is a diagonal matrix with diagonal elements Dkk = xk0′xk0. By multiplying it out, we find that the GLS slopes based on X instead of X0 are βˆ = [(X0D1/2)′Ω1(X0D1/2)]1[(X0D1/2)′Ω1y] = D1/2[X′Ω1X](D′)1/2(D′)1/2X′Ω1y = D1/2 βˆ 0 with variance Var[ βˆ ] = D1/2σ2[X′Ω1X]1(D′)1/2 = D1/2Var[ βˆ 0](D′)1/2. Likewise, the OLS estimator based on X instead of X0 is b = D1/2b0 and has variance Var[b] = D1/2Var[b0](D′)1/2. Since the scaling affects both estimators identically, we may ignore it and simply assume that X′X = I. If each column of X is a characteristic vector of Ω, then, for the kth column, xk, Ωxk = λkxk. Further, xk′Ωxk = λk and xk′Ωxj = 0 for any two different columns of X. (We neglect the scaling of X, so that X′X = I, which we would usually assume for a set of characteristic vectors. The implicit scaling of X is absorbed in the characteristic roots.) Recall that the characteristic vectors of Ω1 are the same as those of Ω while the characteristic roots are the reciprocals. Therefore, X′ΩX = ΛK, the diagonal matrix of the K characteristic roots which correspond to the columns of X. In addition, X′Ω1X = ΛK1, so (X′Ω1X)1 = ΛK, andX′Ω1y = ΛK1X′y. Therefore, the GLS estimator is simply βˆ = X′y with variance Var[ βˆ ] = σ2ΛK. The OLS estimator is b = (X′X)1X′y = X′y. Its variance is Var[b] = σ2(X′X)1X′ΩX(X′X)1 = σ2ΛK, which means that OLS and GLS are identical in this case. 6. Write b = β + (X′X)1X′ε and βˆ = β + (X′Ω1X)1X′Ω1ε. The covariance matrix is E[(b  β)( βˆ  β)′] = E[(X′X)1X′εε′Ω1X(X′Ω1X)1] = (X′X)1X′(σ2Ω)Ω1X(X′Ω1X)1 = σ2(X′Ω1X)1. For part (b), e = Mε as always, so E[ee′] = σ2MΩM. No further simplification is possible for the general case. For part (c), εˆ = y  X βˆ = y  X[β + (X′Ω1X)1X′Ω1ε] = Xβ + ε  X[β + (X′Ω1X)1X′Ω1ε] = [I  X(X′Ω1X)1X′Ω1]ε.
45
Thus, E[ εˆ εˆ ′] = [I  X(X′Ω1X)1X′Ω1]E[εε′][I  X(X′Ω1X)1X′Ω1] ′ = [I  X(X′Ω1X)1X′Ω1](σ2Ω)[I  X(X′Ω1X)1X′Ω1] ′ = [σ2Ω  σ2X(X′Ω1X)1X′][I  X(X′Ω1X)1X′Ω1] ′ = [σ2Ω  σ2X(X′Ω1X)1X′][I  Ω1X(X′Ω1X)1X′] = σ2Ω σ2X(X′Ω1X)1X′  σ2X(X′Ω1X)1X′ + σ2X(X′Ω1)X)1X′Ω1X(X′Ω1X)1X′ = σ2[Ω  X(X′Ω1X)1X′] The GLS residual vector appears in the preceding part. As always, the OLS residual vector is e = Mε = [I  X(X′X)1X′]ε. The covariance matrix is E[e εˆ ′] = E[(I  X(X′X)1X′)εε′(I  X(X′Ω1X)1X′Ω1)′] = (I  X(X′X)1X′)(σ2Ω)(I  Ω1X(X′Ω1X)1X′) = σ2Ω  σ2X(X′X)1X′Ω  σ2ΩΩ1X(X′Ω1X)1X′ + σ2X(X′X)1X′ΩΩ1X(X′Ω1X)1X′ = σ2Ω  σ2X(X′X)1X′ = σ2MΩ. 7. The GLS estimator is βˆ = (X′Ω1X)1X′1y = [Σixixi′/(β′xi)2]1[Σixiyi/(β′xi)2]. The loglikelihood for this model is lnL = Σiln(β′xi)  Σiyi/(β′xi). The likelihood equations are ∂lnL/∂β = Σi(1/β′xi)xi + Σi[yi/(β′xi)2]xi = 0 Σi(xiyi/(β′xi)2) = Σixi/(β′x i). or Σixi/(β′xi) = Σixixi′β/(β′xi)2, Now, write so the likelihood equations are equivalent to Σi(xiyi/(β′x i).2) = Σixixi′β/(β′x i).2, or X′Ω1y = (X′Ω1X)β. These are the normal equations for the GLS estimator, so the two estimators are the same. We should note, the solution is only implicit, since Ω is a function of β. For another more common application, see the discussion of the FIML estimator for simultaneous equations models in Chapter 13. 8.
The covariance matrix is
⎡1 ⎢ρ 2 2⎢ σ Ω =σ ρ ⎢ ⎢ ⎢⎣ ρ
ρ
ρ " ρ⎤ ρ " ρ⎥ ⎥ 1 " ρ .
ρ
# ⎥ ρ " 1 ⎥⎦
ρ 1
⎥
The matrix X is a column of 1s, so the least squares estimator of μ is y . Inserting this Ω into (105), we 2 σ obtain Var[ y ] = (1 − ρ + n ρ ). The limit of this expression is ρσ 2, not zero. Although ordinary least n squares is unbiased, it is not consistent. For this model, X′ΩX/n = 1 + ρ(n – 1), which does not converge. Using Theorem 8.2 instead, X is a column of 1s, so X′X = n, a scalar, which satisfies condition 1. To find the characteristic roots, multiply out the equation Ωx = λx = (1ρ)Ix + ρii′x = λx. Since i′x = Σixi, consider any vector x whose elements sum to zero. If so, then it’s obvious that λ = ρ. There are n1 such roots. Finally, suppose that x = i. Plugging this into the equation produces λ = 1  ρ + nρ. The characteristic roots of Ω are (1 – ρ) with multiplicity n – 1 and (1 – ρ + nρ), which violates condition 2. 9. This is a heteroscedastic regression model in which the matrix X is a column of ones. The efficient ∧
estimator is the GLS estimator, β = (X′Ω1X)1X′Ω1y = [Σi1yi/xi2] / [Σi 12/xi2] = [Σi(yi/xi2)] / [Σi(1/xi2)]. As ∧
always, the variance of the estimator is Var[ β ] = σ2(X′Ω1X)1 = σ2/[Σi(1/xi2)]. The ordinary least squares estimator is (X′X)1X′y = y . The variance of y is σ2(X′X)1(X′ΩX)(X′X)1 = (σ2/n2)Σixi2. To show that the variance of the OLS estimator is greater than or equal to that of the GLS estimator, we must show that (σ2/n2)Σixi2 > σ2/Σi(1/xi2) or (1/n2)(Σixi2)(Σi(1/xi2)) > 1 or ΣiΣj(xi2/xj2) > n2. The double sum contains n terms equal to one. There remain n(n1)/2 pairs of the form (xi2/xj2 + xj2/xi2). If it can be shown that each of these
46
sums is greater than or equal to 2, the result is proved. Just let zi = xi2. Then, we require zi/zj + zj/zi  2 > 0. But, this is equivalent to (zi2 + zj2  2zizj) / zizj > 0 or (zi  zj)2/zizj > 0, which is certainly true if zi and zj are positive. They are since zi equals xi2. This completes the proof. 10. Consider, first, y . We saw earlier that Var[ y ] = (σ2/n2)Σixi2 = (σ2/n)(1/n)Σixi2. The expected value is E[ y ] = E[(1/n)Σiyi] = α. If the mean square of x converges to something finite, then y is consistent for α. That is, if plim(1/n)Σixi2 = q where q is some finite number, then, plim y = α. As such, it follows that s2 and s*2 = (1/(n1))Σi(yi  α)2 have the same probability limit. We consider, therefore, plim s*2 = plim(1/(n1))Σiεi2. The expected value of s*2 is E[(1/(n1)) Σiεi2] = σ2(1/Σixi2). Once again, nothing more can be said without some assumption about xi. Thus, we assume again that the average square of xi converges to a finite, positive constant, q . Of course, the result is unchanged by division by (n1) instead of n, so limn→∞ E[s*2] = σ2 q . The variance of s*2 is Var[s*2] = ΣiVar[εi2]/(n  1)2 . To characterize this, we will require the variances of the squared disturbances, which involves their fourth moments. But, if we assume that every fourth moment is finite, then the preceding is (n/(n1)2) times the average of these fourth moments. If every fourth moment is finite, then the term is dominated by the leading (n/(n1)2) which converges to zero. It follows that plim s*2 = σ2 q . Therefore, the conventional estimator estimates Asy.Var[ y ]= σ2 q /n. The appropriate variance of the least squares estimator is Var[ y ]= (σ2/n2)Σixi2, which is, of course, precisely what we have been analyzing above. It follows that the conventional estimator of the variance of the OLS estimator in this model is an appropriate estimator of the true variance of the least squares estimator. This follows from the fact that the regressor in the model, i, is unrelated to the source of heteroscedasticity, as discussed in the text. 11. The sample moments are obtained using, for example, Sxx = x′x  n x 2 and so on. For the two samples, y x Sxx Syy Sxy we obtain Sample 1 6 6 300 300 200 Sample 2 6 6 300 1000 400 The parameter estimates are computed directly using the results of Chapter 6. Intercept Slope R2 s2 Sample 1 2 2/3 4/9 (1500/9)/48 = 3.472 Sample 2 2 4/3 16/30 (4200/9)/48 = 9.722 ⎡100 600 ⎤ ⎡ 600 ⎤ The pooled moments based on 100 observations are X′X = ⎢ ⎥ , X′y = ⎢4200⎥ , y′y = 4900. The 600 4200 ⎦ ⎦ ⎣ ⎣ coefficient vector based on these data is [a,b] = [0,1]. This might have been predicted since the two X′X matrices are identical. OLS which ignores the heteroscedasticity would simply average the estimates. The sum of squared residuals would be e′e = y′y  b′X′y = 4900  4200 = 700, so the estimate of σ2 is s2 = 700/98 = 7.142. Note that the earlier values obtained were 3.472 and 9.722, so the pooled estimate is between the two, once again, as might be expected. The asymptotic covariance matrix of these estimates is s2(X′X)1 ⎡ .07 −.01⎤ = 7.142 ⎢ ⎥. ⎣−.01 .167 ⎦ To test the equality of the variances, we can use the Goldfeld and Quandt test. Under the null hypothesis of equal variances, the ratio F = [e1′e1/(n1  2)]/[e2′e2/(n2  2)] (or vice versa for the subscripts) is the ratio of two independent chisquared variables each divided by their respective degrees of freedom. Although it might seem so from the discussion in the text (and the literature) there is nothing in the test which requires that the coefficient vectors be assumed equal across groups. Since for our data, the second sample has the larger residual variance, we refer F[48,48] = s22/s12 = 9.722 / 3.472 = 2.8 to the F table. The critical value for 95% significance is 1.61, so the hypothesis of equal variances is rejected. The method of Example 8.5 can be applied to this groupwise heteroscedastic model. The two step ∧
estimator is β = [(1/s12)X1′X1 + (1/s22)X2′X2]1[(1/s12)X1′y1 + (1/s22)X2′y2]. The X′X matrices are the same in
47
∧
this problem, so this simplifies to β = [(1/s12 + 1/s22)X′X]1[(1/s12)X1′y1 + (1/s22)X2′y2] . The estimator is, −1
⎡⎛ 1 1 ⎞ ⎛ 50 300 ⎞ ⎤ ⎡ 1 ⎛ 300 ⎞ 1 ⎛ 300 ⎞ ⎤ ⎛ .9469⎞ + therefore ⎢⎜ ⎟⎥ ⎢ ⎜ ⎟+ ⎜ ⎟⎥ = ⎜ ⎟. ⎟⎜ ⎝ ⎠ 300 2100 2000 3 472 9 722 3 472 9 722 . . . . ⎝ ⎠⎦ ⎣ ⎝ ⎠ ⎝ 2200⎠ ⎦ ⎝ .8422⎠ ⎣ ?======================================================= ? Application 8.1 ?=======================================================
a. The ordinary least squares regression of Y on a constant, X1, and X2 produces the following results: Sum of squared residuals 1911.9275 R2 .03790 Standard error of regression 6.3780 Variable Coefficient Standard Error tratio One .190394 .9144 .208 X1 1.13113 .9826 1.151 X2 .376825 .4399 .857 b. Covariance Matrix White’s Corrected Matrix .836212 .524589 .115451 .96551 .076578 .282366 .047133 .051081 .193532 .399218 .091608 1.14447 c. To apply White's test, we first obtain the residuals from the regression of Y on a constant, X1, and X2. Then, we regress the squares of these residuals on a constant, X1, X2, X12, X22, and X1X2. The R2 in this regression is .78296, so the chisquared statistic is 50×0.78296 = 39.148. The critical value from the table of chisquared with 5 degrees of freedom is 11.08, so we would conclude that there is evidence of heteroscedasticity. d. Lagrange multiplier test. Regress;Lhs=y;rhs=one,x1,x2 ; Res=e ; het $ create ; lmi=e*e/(sumsqdev/n)  1 $ Name ; x=one,x1,x2 $ Calc ; list ; .5*xss(x,lmi)$ The result was reported with the regression,  Br./Pagan LM Chisq [ 2] (prob) = 72.78 (.0000)  e. Two step estimator read;nobs=50;nvar=1;names=y;byva $ 1.42 2.75 .26 4.87 .62 7.01 1.26 .15 5.51 15.22 .35 .48
2.10 5.94 26.14 3.41 1.47 1.24
5.08 2.21 7.39 5.45 1.48 .69
1.49 6.87 .79 1.31 6.66 1.91
1.00 .90 1.93 1.52 1.78
.16 1.11 1.61 2.11 1.97 23.17 2.04 3.00 2.62 5.16
1.66 3.82 2.52 6.31 4.71
.68 .77 .12 .60 .17 1.02
.23 1.04 .66 .79 .33
.40 .28 1.06 .86 .48
1.13 .58 .66 2.04 1.90
.15 .41 1.18 .51 .18
.67 .70 .32 2.88 .19 1.28 2.72 .70 .74 1.87 1.56 .37 2.07 1.20 .26 1.34 .61 2.32 4.38 2.16 1.51 .30 .17 7.82 1.77 2.92 1.94 2.09 1.50 .46 .19 .39 1.87 3.45 .88 1.53 1.42 2.70 1.77 1.89 2.01 1.26 2.02 1.91 2.23 Regress;Lhs=y;rhs=one,x1,x2 ; Res=e $ ++  Ordinary least squares regression   Model was estimated May 12, 2007 at 08:33:20PM   LHS=Y Mean = .3938000   Standard deviation = 6.368374   WTS=none Number of observs. = 50   Model size Parameters = 3 
1.55 2.10 1.15 1.54 1.85
read;nobs=50;nvar=1;names=x1;byva $ 1.65 .63 1.78 .80 .02 .18
1.48 .34 1.25 1.32 .33 1.62
.77 .35 .22 .16 1.99 .39
.67 .79 1.25 1.06 .70 .17
read;nobs=50;nvar=1;names=x2;byva $
48
 Degrees of freedom = 47   Residuals Sum of squares = 1911.928   Standard error of e = 6.378033   Fit Rsquared = .3790450E01   Adjusted Rsquared = .3035736E02   Model test F[ 2, 47] (prob) = .93 (.4033)   Diagnostic Log likelihood = 162.0430   Restricted(b=0) = 163.0091   Chisq [ 2] (prob) = 1.93 (.3806)   Info criter. LogAmemiya Prd. Crt. = 3.763988   Akaike Info. Criter. = 3.763844   Autocorrel DurbinWatson Stat. = 1.8560359   Rho = cor[e,e(1)] = .0719820  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant .19039401 .91444640 .208 .8360 X1  1.13113339 .98260352 1.151 .2555 .10820000 X2  .37682493 .43992218 .857 .3960 .21500000 Create ; e2 = e*e $ Create ; loge2 = log(e2) $ Regress ; lhs = loge2 ; Rhs = one,x1,x2 ; keep=vi $ Create ; vi = 1/exp(vi) $ Regress ; Lhs = y ; rhs = one,x1,x2 ; wts = vi $ ++  Ordinary least squares regression   Model was estimated May 12, 2007 at 08:33:20PM   LHS=Y Mean = .5316339   Standard deviation = 4.535703   WTS=VI Number of observs. = 50   Model size Parameters = 3   Degrees of freedom = 47   Residuals Sum of squares = 890.9017   Standard error of e = 4.353775   Fit Rsquared = .1162193   Adjusted Rsquared = .7861157E01   Model test F[ 2, 47] (prob) = 3.09 (.0548)   Diagnostic Log likelihood = 150.0732   Restricted(b=0) = 153.1619   Chisq [ 2] (prob) = 6.18 (.0456)   Info criter. LogAmemiya Prd. Crt. = 3.000355   Akaike Info. Criter. = 3.285051   Autocorrel DurbinWatson Stat. = 1.9978648   Rho = cor[e,e(1)] = .0010676  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant .16662621 .71981411 .231 .8179 X1  .77648745 .63883379 1.215 .2303 .51884171 X2  .84717700 .36328984 2.332 .0240 .34867101
49
Applications ?======================================================= ? Application 8.2 Gasoline Consumption ?======================================================= ? Rename variable for convenience Create ; y=lgaspcar $ ? RHS of new regression Namelist ; x = one,lincomep,lrpmg,lcarpcap $ ? Base regression. Is cars per capita significant? Regress ; Lhs = y ; Rhs = x $ ++  Ordinary least squares regression   LHS=Y Mean = 4.296242   Standard deviation = .5489071   WTS=none Number of observs. = 342   Model size Parameters = 4   Degrees of freedom = 338   Residuals Sum of squares = 14.90436   Standard error of e = .2099898   Fit Rsquared = .8549355   Adjusted Rsquared = .8536479   Model test F[ 3, 338] (prob) = 664.00 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 2.39132562 .11693429 20.450 .0000 LINCOMEP .88996166 .03580581 24.855 .0000 6.13942544 LRPMG  .89179791 .03031474 29.418 .0000 .52310321 LCARPCAP .76337275 .01860830 41.023 .0000 9.04180473Calc ; r0 = rsqrd $ Namelist ; Cntry=c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18$ Regress;lhs=y;rhs=x,cntry ; Res = e $ ++  Ordinary least squares regression   LHS=Y Mean = 4.296242   Standard deviation = .5489071   WTS=none Number of observs. = 342   Model size Parameters = 21   Degrees of freedom = 321   Residuals Sum of squares = 2.736491   Standard error of e = .9233035E01   Fit Rsquared = .9733657   Adjusted Rsquared = .9717062   Model test F[ 20, 321] (prob) = 586.56 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 2.28585577 .22832349 10.011 .0000 LINCOMEP .66224966 .07338604 9.024 .0000 6.13942544 LRPMG  .32170246 .04409925 7.295 .0000 .52310321 LCARPCAP .64048288 .02967885 21.580 .0000 9.04180473 C2  .12030455 .03414942 3.523 .0005 .05555556 C3  .75598453 .04074554 18.554 .0000 .05555556 C4  .10360026 .03660467 2.830 .0049 .05555556 C5  .08108439 .03356343 2.416 .0163 .05555556 C6  .13598740 .03187957 4.266 .0000 .05555556 C7  .05125389 .04152961 1.234 .2180 .05555556 C8  .30646950 .03529373 8.683 .0000 .05555556 C9  .05330785 .03711258 1.436 .1519 .05555556 C10  .09007170 .03860659 2.333 .0203 .05555556 C11  .05106438 .03357607 1.521 .1293 .05555556 C12  .06915517 .04040779 1.711 .0880 .05555556
50
C13  .60407878 .09122015 6.622 .0000 C14  .74048679 .18008419 4.112 .0000 C15  .11664698 .03471246 3.360 .0009 C16  .22413229 .04764432 4.704 .0000 C17  .05959184 .03018816 1.974 .0492 C18  .76939510 .04457642 17.260 .0000 Calc ; r1 = rsqrd $ Calc ; list ; Fstat = ((r1  r0)/17) / ((1r1)/(n417)) $ Calc ; list ; Fc =ftb(.95,17,(n417)) $ ++  Listed Calculator Results  ++ FSTAT = 83.960798 FC = 1.654675 Plot ; lhs = country ; rhs = e ; Bars = 0 ;Title=Plot of OLS Residuals by Country $
.05555556 .05555556 .05555556 .05555556 .05555556 .05555556
Regress;lhs=y;rhs=x,cntry ; Het $ ++  Ordinary least squares regression   LHS=Y Mean = 4.296242   Standard deviation = .5489071   WTS=none Number of observs. = 342   Model size Parameters = 21   Degrees of freedom = 321   Residuals Sum of squares = 2.736491   Standard error of e = .9233035E01   Fit Rsquared = .9733657   Adjusted Rsquared = .9717062   Model test F[ 20, 321] (prob) = 586.56 (.0000)   White heteroscedasticity robust covariance matrix   Br./Pagan LM Chisq [ 20] (prob) = 338.94 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 2.28585577 .22608070 10.111 .0000 LINCOMEP .66224966 .07277408 9.100 .0000 6.13942544 LRPMG  .32170246 .05381258 5.978 .0000 .52310321 LCARPCAP .64048288 .03876145 16.524 .0000 9.04180473 C2  .12030455 .03160815 3.806 .0002 .05555556 C3  .75598453 .03692877 20.471 .0000 .05555556 C4  .10360026 .03642008 2.845 .0047 .05555556 C5  .08108439 .03252022 2.493 .0132 .05555556 C6  .13598740 .03504274 3.881 .0001 .05555556 C7  .05125389 .05768530 .889 .3749 .05555556 C8  .30646950 .03516370 8.716 .0000 .05555556
51
C9 C10 C11 C12 C13 C14 C15 C16 C17 C18
         
.05330785 .09007170 .05106438 .06915517 .60407878 .74048679 .11664698 .22413229 .05959184 .76939510
.04078467 .05606508 .03228064 .03857838 .09798870 .18836593 .03500336 .08147015 .03166823 .04121364
1.307 1.607 1.582 1.793 6.165 3.931 3.332 2.751 1.882 18.668
.1921 .1091 .1147 .0740 .0000 .0001 .0010 .0063 .0608 .0000
.05555556 .05555556 .05555556 .05555556 .05555556 .05555556 .05555556 .05555556 .05555556 .05555556
Create ; e2 = e*e $ Regress ; Lhs = e2 ; Rhs = one,cntry $ Calc ; List ; White = n*rsqrd ; ctb(.95,17) $ ++  Listed Calculator Results  ++ WHITE = 131.209847 Result = 27.587112 Calc ; s2 = e'e/n $ Matrix ; s2g = {1/19} * cntry'e2 ; s2g = 1/s2 * s2g ; g = s2g  1 ; List ; lmstat = {19/2}*g'g $ Matrix LMSTAT has 1 rows and 1 columns. +1 277.00947 Name ; All = c1,cntry $ Matrix ; vg = 1/19*all'e2 $ Create ; wt = 1/vg(country) $ Regress ; Lhs = y ; rhs = x,cntry;wts=wt $ ++  Ordinary least squares regression   LHS=Y Mean = 4.460122   Standard deviation = .4535009   WTS=WT Number of observs. = 342   Model size Parameters = 21   Degrees of freedom = 321   Residuals Sum of squares = .5901434   Standard error of e = .4287719E01   Fit Rsquared = .9915851   Adjusted Rsquared = .9910608   Model test F[ 20, 321] (prob) =1891.29 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 2.43706653 .11308370 21.551 .0000 LINCOMEP .57506962 .02926687 19.649 .0000 5.84790214 LRPMG  .27967108 .03518536 7.949 .0000 .87736963 LCARPCAP .56540465 .01613491 35.042 .0000 8.34742189 C2  .12007208 .02789011 4.305 .0000 .08866789 C3  .76945446 .03011060 25.554 .0000 .34252221 C4  .11000512 .03169158 3.471 .0006 .01995470 C5  .09845013 .02921659 3.370 .0008 .05724878 C6  .13641007 .03387520 4.027 .0001 .01079455 C7  .13502296 .04413211 3.060 .0024 .00604952 C8  .28669153 .03200056 8.959 .0000 .01577251 C9  .08901681 .03324265 2.678 .0078 .01701683 C10  .15281210 .05659004 2.700 .0073 .00228044 C11  .04087890 .02882321 1.418 .1571 .03809105 C12  .05220341 .02952832 1.768 .0780 .09438377 C13  .53400193 .06166458 8.660 .0000 .01328985 C14  .64117855 .10737812 5.971 .0000 .06594614 C15  .12783552 .03189740 4.008 .0001 .02454617 C16  .38638811 .05013313 7.707 .0000 .00712693 C17  .04507072 .03121765 1.444 .1498 .01629698
52
C18

.77812476
.03277077
23.744
.0000
.17152029
?======================================================= ? Application 8.3 Iterative estimator ?======================================================= create ; logc = log(c) ; logq=log(q) ; logq2=logq^2 ; logp=log(pf) $ Name ; x = one,logq,logq2,logp $ Regress ; lhs = logc ; rhs = x ; Res = e $ Matrix ; b0=b $ Procedure$ Create ; e2 = e*e ; le = e2/(sumsqdev/n)1 $ (MLE) ?le = log(e2) $ (Iterative two step) Regress ; quiet ; lhs=le ; rhs=one,lf ; keep = s2i $ Create ; wi = 1/exp(s2i) $ Regress ; lhs = logc ; rhs = x ; wts=wi ; res=e $ Matrix ; db = bb0 ; b0 = b $ Calc ; list ; db2 = db'db $ Endproc $ Exec ; n = 10 $ These are the two step estimators from Example 8.4 ++  Ordinary least squares regression   LHS=LOGC Mean = 12.92005   Standard deviation = 1.192244   WTS=WI Number of observs. = 90   Model size Parameters = 4   Degrees of freedom = 86   Residuals Sum of squares = 1.212889   Standard error of e = .1187576   Fit Rsquared = .9904126   Adjusted Rsquared = .9900782   Model test F[ 3, 86] (prob) =2961.37 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 9.27731457 .20978736 44.222 .0000 LOGQ  .91610564 .03299348 27.766 .0000 1.56779393 LOGQ2  .02164855 .01101812 1.965 .0527 3.87530677 LOGP  .40174171 .01633292 24.597 .0000 12.4336185 These are the maximum likelihood estimates ++  Ordinary least squares regression   Residuals Sum of squares = 1.347926   Standard error of e = .1251941   Fit Rsquared = .9892110   Adjusted Rsquared = .9888346   Model test F[ 3, 86] (prob) =2628.35 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 9.24395222 .21962091 42.090 .0000 LOGQ  .92163069 .03302261 27.909 .0000 1.43646434 LOGQ2  .02461767 .01143734 2.152 .0342 3.46800689 LOGP  .40366011 .01701993 23.717 .0000 12.5455161
53
Chapter 9 ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Models for Panel Data 1. The pooled least squares estimator is ∧
.747476 + 1.058959x, e′e = 120.6687 (.95595) (.058656) The fixed effects regression can be computed just by including the three dummy variables since the sample sizes are quite small. The results are y=
∧
y =  1.4684i1  2.8362i2 + .12166i3 + 1.102192x e′e = 79.183. (.050719) The F statistic for testing the hypothesis that the constant terms are all the same is F[26,2] = [(120.6687  79.183)/2]/[79.183/26] = 6.811. The critical value from the F table is 19.458, so the hypothesis is not rejected. In order to estimate the random effects model, we need some additional parameter estimates. The y x group means are Group 1 15.502 14.962 Group 2 15.415 16.559 Group 3 14.373 12.930 In the group means regression using these three observations, we obtain y i. = 10.665 + .29909 x i. with e**′e** = .19747. There is only one degree of freedom, so this is the candidate for estimation of σε2/T + σu2. In the least squares dummy variable (fixed effects) regression, we have an estimate of σε2 of 79.183/26 = 3.045. Therefore, our ∧2
estimate of σu2 is σ u = .19747/1  3.045/10 = .6703. Obviously, this won't do. Before abandoning the random effects model, we consider an alternative consistent estimator of the constant and slope, the pooled ordinary least squares estimator. Using the group means above, we find Σ i3=1 [ y i.  (.747476)  1.058959 x i.]2 = 3.9273. One ought to proceed with some caution at this point, but it is difficult to place much faith in the group means regression with but a single degree of freedom, so this is probably a preferable estimator in any event. (The true model underlying these data  using a random number generator  has a slope, β of 1.000 and a true constant of zero. Of course, this would not be known to the analyst in a real world situation.) Continuing, we ∧
now use σ 2u = 3.9273  3.045/10 = 3.6227 as the estimator. (The true value of ρ = σu2/(σu2+σε2) is .5.) This leads to θ = 1  [3.04551/2/(10(3.6227) + 3.045)1/2] = .721524. Finally, the FGLS estimator computed ∧
according to (1648) is y = 1.3415(.786) + 1.0987 (.028998)x. For the LM test, we return to the pooled ordinary least squares regression. The necessary quantities are e′e = 120.6687, Σt e1t = .55314, Σt e2t = 13.72824, Σt e3t = 14.28138. Therefore, LM = {[3(10)]/[2(9)]}{[(.55314)2 + (13.72824)2 + (14.28138)2]/120.687  1}2 = 8.4683 The statistic has one degree of freedom. The critical value from the chisquared distribution is 3.84, so the hypothesis of no random effect is rejected. Finally, for the Hausman test, we compare the FGLS and least squares dummy variable estimators. The statistic is χ2 = [(1.0987  1.058959)2]/[(.058656)2  (.05060)2] = 1.794373. This is relatively small and argues (once again) in favor of the random effects model.
54
2. There is no effect on the coefficients of the other variables. For the dummy variable coefficients, with the full set of n dummy variables, each coefficient is y i * = mean residual for the ith group in the regression of y on the xs omitting the dummy variables. (We use the partitioned regression results of Chapter 6.) If an overall constant term and n1 dummy variables (say the last n1) are used, instead, the coefficient on the ith dummy variable is simply y i*  y 1* while the constant term is still y 1* For a full proof of these results, see the solution to Exercise 5 of Chapter 8 earlier in this book. 3.
(a)
−1
The pooled OLS estimator will be b = ⎡⎣ Σin=1 X′i Xi ⎤⎦ ⎡⎣ Σin=1 X′i y i ⎤⎦ where Xi and yi have Ti
observations. It remains true that yi = Xiβ + εi + uii, where Var[εi + uiiXi] = Var[wiXi] = σε2I + σu2ii′ and, maintaining the assumptions, both εi and ui are uncorrelated with Xi. Substituting the expression for yi into that of b and collecting terms, we have −1
b = β + ⎡⎣ Σin=1 X′i Xi ⎤⎦ ⎡⎣ Σin=1X′i w i ⎤⎦ . Unbiasedness follows immediately as long as E[wiXi] equals zero, which it does by assumption. Consistency, as mentioned in Section 9.3.2, is covered in the discussion of Chapter 4. We would need for the matrix Q = ⎡⎣ 1n Σin=1 T1i X′i Xi ⎤⎦ to converge to a matrix of constants, or not to degenerate to a matrix of zeros. The requirements for the large sample behavior of the vector in the second set of brackets is quite the same as in our earlier discussions of consistency. The vector (1/ n)Σ in=1X′i w i = (1/ n)Σin=1 v i has mean zero. We would require the conditions of the LindebergFeller version of the central theorem to apply, which could be expected. (b) We seek to establish consistency, not unbiasedness. As such, we will ignore the degrees of freedom correction, K, in (937). Use n(T1) as the denominator. Thus, the question is whether Σ n ΣT ( e − e ) 2 plim i =1 t =1 it i. = σε2 n(T − 1) If so, then the estimator in (937) will be consistent. Using (933) and eit  ei = yi − xi′b − ai , it follows that eit − ei = εit − εi − (xit − xi )(b − β) . Summing the squares in (937), we find that the estimator in (937) Σin=1ΣTt =1 (eit − ei. ) 2 1 n 2 ⎡1 n 1 T ⎤ = ∑ i =1 σˆ (i ) + (b − β )′ ⎢ ∑ i =1 ∑ t =1 (xit − xi )(xit − xi )′⎥ (b − β) n(T − 1) n T ⎣n ⎦ ⎡1 n 1 T ⎤  2(b − β)′ ⎢ ∑ i =1 ∑ t =1 ( xit − xi )(εit − εi. )′ ⎥ T ⎣n ⎦ The second term will converge to zero as the center matrix converges to a constant Q and the vectors converge to zero as b converges to β. (We use the Slutsky theorem.) The third term will converge to zero as both the leading vector converges to zero and the covariance vector between the regressors and the disturbances converges to zero. That leaves the first term, which is the average of the estimators in (934). The terms in the average are independent. Each has expected value exactly equal to σε2. So, if each estimator has finite variance, then the average will converge to its expectation. Appendix D discusses various different conditions underwhich a sample average will converge to its expectation. For example, finite fouth moment of εit would be sufficient here (though weaker conditions would also suffice). Note that this derivation follows through for any consistent estimator of β, not just for b.
4. To find plim(1/n)LM = plim [T/(2(T1))]{[Σi(Σteit)2]/[ΣiΣteit2]  1}2 we can concentrate on the sums inside the curled brackets. First, Σi(Σteit)2 = nT2{(1/n)Σi[(1/T)Σteit]2} and ΣiΣteit2 = nT(1/(nT))ΣiΣteit2. The ratio equals [Σi(Σteit)2]/[ΣiΣteit2] = T{(1/n)Σi[(1/T)Σteit]2}/{(1/(nT))ΣiΣteit2}. Using the argument used in Exercise 8 to establish consistency of the variance estimator, the limiting behavior of this statistic is the same as that which is computed using the true disturbances since the OLS coefficient estimator is consistent. Using the 2
true disturbances, the numerator may be written (1/n)Σi[(1/T)Σtεit]2 = (1/n)Σi ε i.
Since E[ ε i. ] = 0,
55
2
plim(1/n)Σi ε i. = Var[ ε i. ] = σε2T + σu2The denominator is simply the usual variance estimator, so plim(1/(nT))ΣiΣtεit2 = Var[εit] = σε2+ σu2Therefore, inserting these results in the expression for LM, we find that plim (1/n)LM = [T/(2(T1))]{[T(σε2T + σu2)]/[σε2+ σu2]  1}2. Under the null hypothesis that σu2 = 0, this equals 0. By expanding the inner term then collecting terms, we find that under the alternative hypothesis that σu2 is not equal to 0, plim (1/n)LM = [T(T1)/2][ σu2/(σε2+σu2)]2. Within group i, Corr2[εit,εis] = ρ2 = σu2/(σu2+ σε2) so plim (1/n)LM = [T(T1)/2](ρ2)2. It is worth noting what is obtained if we do not divide the LM statistic by n at the outset. Under the null hypothesis, the limiting distribution of LM is chisquared with one degree of freedom. This is a random variable with mean 1 and variance 2, so the statistic, itself, does not converge to a constant; it converges to a random variable. Under the alternative, the LM statistic has mean and variance of order n (as we see above) and hence, explodes. It is this latter attribute which makes the test a consistent one. As the sample size increases, the power of the LM test must go to 1. 5. The ordinary least squares regression results are R2 = .92803, e′e = 146.761, 40 observations Variable Coefficient Standard Error X1 .446845 .07887 X2 1.83915 .1534 Constant 3.60568 2.555 Period 1 3.57906 1.723 Period 2 1.49784 1.716 Period 3 2.00677 1.760 Period 4 3.03206 1.731 Period 5 5.58937 1.768 Period 6 1.49474 1.714 Period 7 1.52021 1.714 Period 8 2.25414 1.737 Period 9 3.29360 1.722 Group 1 .339998 1.135 Group 2 4.39271 1.183 Group 3 5.00207 1.125 Estimated covariance matrix for the slopes: β1 β2 β1 .0062209 β2 .00030947 .023523 For testing the hypotheses that the sets of dummy variable coefficients are zero, we will require the sums of squared residuals from the restrictions. These are Regression Sum of squares All variables included 146.761 Period variables omitted 318.503 Group variables omitted 369.356 Period and group variables omitted 585.622 The F statistics are therefore, (1) F[9,25] = [(318.503  146.761)/9]/[146.761/25] = 3.251 (2) F[3,25] = [(369.356  146.761)/3]/[146.761/25] = 12.639 (3) F[12,25] = [(585.622  146.761)/12]/[146.761/25] = 6.23 The critical values for the three distributions are 2.283, 2.992, and 2.165, respectively. All sample statistics are larger than the table value, so all of the hypotheses are rejected. 6. The covariance matrix would be
56
i = 1, t = 1 + σ 2u + σ 2v
i = 1, t = 2
i = 2, t = 1
i = 2, t = 2
i = 1, t = 1 0 i = 1, t = 2 + σ v2 σ u2 σ v2 0 i = 2, t = 1 σ v2 σ 2ε + σ u2 + σ v2 σ u2 0 i = 2, t = 2 σ v2 σ u2 σ 2ε + σ u2 + σ v2 0 7. The two separate regressions are as follows: Sample 1 Sample 2 b = x′y/x′x 4/5 = .8 6/10 = .6 e′e = y′y  bx′y 20  4(4/5) = 84/5 10  6(6/10) = 64/10 R2 = 1  e′e/y′y 1  (84/5)/20 = .16 1  (64/10)/10 = .36 s2 = e′e/(n1) (84/5)/19 = .88421 (64/10)/19 = .33684 Est.Var[b] = s2/x′x .88421/5 = .17684 .33684/10 = .033684 To carry out a Lagrange multiplier test of the hypothesis of equal variances, we require the separate and common variance estimators based on the restricted slope estimator. This, in turn, is the pooled least squares estimator. For the combined sample, we obtain b = [x1′y1 + x2′y2]/[x1′x1 + x2′x2] = (4 + 6) / (5 + 10) = 2/3. Then, the variance estimators are based on this estimate. For the hypothesized common variance, e′e = (y1′y1 + y2′y2)  b(x1′y1 + x2′y2) = (20 + 10)  (2/3)(4 + 6) = 70/3, so the estimate of the common variance is e′e/40 = (70/3)/40 = .58333. Note that the divisor is 40, not 39, because we are comptuting maximum likelihood estimators. The individual estimators are e1′e1/20 = (y1′y1  2b(x1′y1) + b2(x1′x1))/20 = (20  2(2/3)4 + (2/3)25)/20 = .84444 and e2′e2/20 = (y2′y2  2b(x2′y2) + b2(x2′x2))/20 = (10  2(2/3)6 + (2/3)210)/20 = .32222. The LM statistic is given in Example 16.3, LM = (T/2)[(s12/s2  1)2 + (s22/s2  1)2] = 10[(.84444/.58333  1)2 + (.32222/.58333  1)2] = 4.007. This has one degree of freedom for the single restriction. The critical value from the chisquared table is 3.84, so we would reject the hypothesis. In order to compute a two step GLS estimate, we can use either the original variance estimates based on the separate least squares estimates or those obtained above in doing the LM test. Since both pairs are consistent, both FGLS estimators will have all of the desirable asymptotic properties. For our estimator, we σ 2ε
σ 2u 2 σ ε + σ u2
∧
σ 2v
∧
∧
used σ 12 = ej′ej/T from the original regressions. Thus, σ 12 = .84 and σ 22 = .32. The GLS estimator is ∧
∧
∧
∧
∧
β = [(1/ σ 12 )x1′y1 + (1/ σ 22)x2′y2]/[ (1/ σ 12 )x1′x1 + (1/ σ 22)x2′x2] = [4/.84 + 6/.32]/[5/.84 + 10/.32] = .632. ∧
∧
The estimated sampling variance is 1/[ (1/ σ 12 )x1′x1 + (1/ σ 22)x2′x2] = .02688. This implies an asymptotic standard error of (.02688)2 = .16395. To test the hypothesis that β = 1, we would refer z = (.632  1) / .16395 = 2.245 to a standard normal table. This is reasonably large, and at the usual significance levels, would lead to rejection of the hypothesis. The Wald test is based on the unrestricted variance estimates. Using b = .632, the variance ∧
estimators are
σ 12 = [y1′y1  2b(x1′y1) + b2(x1′x1)]/20 = .847056
∧
and
σ 22= [y2′y2  2b(x2′y2) + b2(x2′x2)]/20 = .320512 ∧
while the pooled estimator would be σ 2= [y′y  2b(x′y) + b2(x′x)]/40 = .583784. The statistic is given at the ∧
∧
∧
∧
end of Example 16.3, W = (T/2)[( σ / σ 12  1)2 + ( σ / σ 22  1)2] = 10[(.583784/.847056  1)2 + (.583784/.320512  1)2] = 7.713. We reach the same conclusion as before. To compute the maximum likelihood estimators, we begin our iterations from the two separate ∧
∧
ordinary least squares estimates of b which produce estimates σ 12 = .84 and σ 22= .32. The iterations are ∧
Iteration 0
σ 12 .840000
∧
σ 22 .320000
∧
β .632000
57
1 .847056 .320512 .631819 2 .847071 .320506 .631818 3 .847071 .320506 converged Now, to compute the likelihood ratio statistic for a likelihood ratio test of the hypothesis of equal variances, we refer χ2 = 40ln.58333  20ln.847071  20ln.320506 to the chisquared table. (Under the null hypothesis, the pooled least squares estimator is maximum likelihood.) Thus, χ2 = 4.5164, which is roughly equal to the LM statistic and leads once again to rejection of the null hypothesis. Finally, we allow for cross sectional correlation of the disturbances. Our initial estimate of b is the pooled least squares estimator, 2/3. The estimates of the two variances are .84444 and .32222 as before while the cross sectional covariance estimate is e1′e2/20 = [y1′y2  b(x1′y2 + x2′y1) + b2(x1′x2)]/20 = .14444. Before proceeding, we note, the estimated squared correlation of the two disturbances is r = .14444 / [(.84444)(.32222)]1/2 = .277, which is not particularly large. The LM test statistic given in (1614) is 1.533, which is well under the critical value of 3.84. Thus, we would not reject the hypothesis of zero cross section correlation. Nonetheless, we proceed. The estimator is shown in (166). The two step FGLS and iterated maximum likelihood estimates ∧
∧
∧
∧
σ 12 σ 22 σ 12 Iteration β 0 .84444 .32222 .14444 .5791338 1 .8521955 .3202177 .1597994 .5731058 2 .8528702 .3203616 .1609133 .5727069 3 .8529155 .3203725 .1609873 .5726805 4 .8529185 .3203732 .1609921 .5726788 5 .8529187 .3203732 .1609925 converged Because the correlation is relatively low, the effect on the previous estimate is relatively minor.
appear below.
8. If all of the regressor matrices are the same, the estimator in (835) reduces to ∧
β = (X′X)1 Σ in= 1 {(1/σi2)/[Σ
n j =1
(1/σj2)]}X′yi = Σ in= 1 wibi
a weighted average of the ordinary least squares estimators, bi = (X′X)1X′yi with weights wi = (1/σi2)/[Σ nj =1 (1/σj2)]. If it were necessary to estimate the weights, a simple two step estimator could be based on individual variance estimators. Either of si2 = ei′ei/T based on separate least squares regressions (with different estimators of β) or based on residuals computed from a common pooled ordinary least squares slope estimator could be used. 9. The various least squares estimators of the parameters are Sample 1 Sample 2 Sample 3 Pooled a 11.6644 5.42213 1.41116 8.06392 (9.658) (10.46) (7.328) b .926881 1.06410 1.46885 1.05413 (.4328) (.4756) (.3590) e′e 452.206 673.409 125.281 (464.288) (732.560) (171.240) (1368.088) (Values of e′e in parentheses above are based on the pooled slope estimator.) The FGLS estimator and its estimated asymptotic covariance matrix are . ⎡ 22.8049 − 10629 ⎤ ⎛ 7.17889⎞ b =⎜ ⎟ , Est.Asy.Var[b] = ⎢ . . 0.05197 ⎥⎦ ⎝ 113792 ⎠ ⎣ − 10629 Note that the FGLS estimator of the slope is closer to the 1.46885 of sample 3 (the highest of the three OLS estimates). This is to be expected since the third group has the smallest residual variance. The LM test statistic is based on the pooled regression, LM = (10/2){[(464.288/10)/(1368.088/30)  1]2 + ...} = 3.7901
58
To compute the Wald statistic, we require the unrestricted regression. The parameter estimates are given above. The sums of squares are 465.708, 785.399, and 145.055 for i = 1, 2, and 3, respectively. For the common estimate of σ2, we use the total sum of squared GLS residuals, 1396.162. Then, W = (10/2){[(1396.162/30)/(465.708/10)  1]2 + ...} = 25.21. The Wald statistic is far larger than the LM statistic. Since there are two restrictions, at significance levels of 95% or 99% with critical values of 5.99 or 9.21, the two tests lead to different conclusions. The likelihood ratio statistic based on the FGLS estimates is χ2 = 30ln(1396.162/30)  10ln(465.708/10) ... = 6.42 which is between the previous two and between the 95% and 99% critical values.
Applications As usual, the applications below require econometric software. The computations can be done with any modern software package, so no specific program is recommended. > read $ Last observation read from data file was 200 End of data listing in edit window was reached > REGRESS ; Lhs = I ; Rhs = F,C,one $ ++  Ordinary least squares regression   LHS=I Mean = 145.9582   Standard deviation = 216.8753   WTS=none Number of observs. = 200   Model size Parameters = 3   Degrees of freedom = 197   Residuals Sum of squares = 1755850.   Standard error of e = 94.40840   Fit Rsquared = .8124080   Adjusted Rsquared = .8105035   Model test F[ 2, 197] (prob) = 426.58 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ F  .11556216 .00583571 19.803 .0000 1081.68110 C  .23067849 .02547580 9.055 .0000 276.017150 Constant 42.7143694 9.51167603 4.491 .0000 > CALC ; R0=Rsqrd $ > REGRESS ; Lhs = I ; Rhs = F,C,one ; Cluster = 20 $ ++  Ordinary least squares regression   LHS=I Mean = 145.9582   Standard deviation = 216.8753   WTS=none Number of observs. = 200   Model size Parameters = 3   Degrees of freedom = 197   Residuals Sum of squares = 1755850.   Standard error of e = 94.40840   Fit Rsquared = .8124080   Adjusted Rsquared = .8105035   Model test F[ 2, 197] (prob) = 426.58 (.0000)  ++ ++  Covariance matrix for the model is adjusted for data clustering.   Sample of 200 observations contained 10 clusters defined by   20 observations (fixed number) in each cluster.   Sample of 200 observations contained 1 strata defined by   200 observations (fixed number) in each stratum.  ++
59
+++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ F  .11556216 .01589434 7.271 .0000 1081.68110 C  .23067849 .08496711 2.715 .0072 276.017150 Constant 42.7143694 20.4252029 2.091 .0378 The standard errors increase substantially. This is at least suggestive that there is correlation across observations within the groups. A formal test would be based on one of the panel models below. When the random effects model is fit by maximum likelihood, for example, the log likelihood function is 1095.257. The log likelihood function for the pooled model is 1191.802. Thus, the correlation is highly significant. The Lagrange multiplier statistic reported below is 798.16, which is far larger than the critical value of 3.84. Once again, these results do suggest within groups correlation. > REGRESS ; Lhs = I ; Rhs = F,C,one ; Panel ; Pds=20 ; Fixed $ ++  Least Squares with Group Dummy Variables   Ordinary least squares regression   LHS=I Mean = 145.9583   Standard deviation = 216.8753   WTS=none Number of observs. = 200   Model size Parameters = 12   Degrees of freedom = 188   Residuals Sum of squares = 523478.1   Standard error of e = 52.76797   Fit Rsquared = .9440725   Adjusted Rsquared = .9408002   Model test F[ 11, 188] (prob) = 288.50 (.0000)  ++ ++  Panel:Groups Empty 0, Valid data 10   Smallest 20, Largest 20   Average group size 20.00  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ F  .11012380 .01185669 9.288 .0000 1081.68110 C  .31006534 .01735450 17.867 .0000 276.017150 ++  Test Statistics for the Classical Model  ++  Model LogLikelihood Sum of Squares Rsquared  (1) Constant term only 1359.15096 .9359943929D+07 .0000000  (2) Group effects only 1216.34872 .2244352274D+07 .7602173  (3) X  variables only 1191.80236 .1755850484D+07 .8124080  (4) X and group effects 1070.78103 .5234781474D+06 .9440725  ++  Hypothesis Tests   Likelihood Ratio Test F Tests   Chisquared d.f. Prob. F num. denom. P value  (2) vs (1) 285.604 9 .00000 66.932 9 190 .00000  (3) vs (1) 334.697 2 .00000 426.576 2 197 .00000  (4) vs (1) 576.740 11 .00000 288.500 11 188 .00000  (4) vs (2) 291.135 2 .00000 309.014 2 188 .00000  (4) vs (3) 242.043 9 .00000 49.177 9 188 .00000  ++ > CALC ; R1 = Rsqrd $ > MATRIX ; bf = b(1:2) ; vf = varb(1:2,1:2) $ > CALC ; List ; Fstat=((R1R0)/9)/((1R1)/(n210)) ; FC=Ftb(.95,9,(n210)) $ ++  Listed Calculator Results  ++ FSTAT = 49.176625
60
FC
=
1.929957
The F statistic of 49.18 is far larger than the critical value, so the hypothesis of equal constant terms is rejected. > REGRESS ; Lhs = I ; Rhs = F,C,one ; Panel ; Pds=20 ; Random $ ++  Random Effects Model: v(i,t) = e(i,t) + u(i)   Estimates: Var[e] = .278446D+04   Var[u] = .612849D+04   Corr[v(i,t),v(i,s)] = .687594   Lagrange Multiplier Test vs. Model (3) = 798.16   ( 1 df, prob value = .000000)   (High values of LM favor FEM/REM over CR model.)   Sum of Squares .184029D+07   Rsquared .803387D+00  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ F  .10974919 .01031952 10.635 .0000 1081.68110 C  .30780890 .01715154 17.946 .0000 276.017150 Constant 57.7159079 27.1118671 2.129 .0333 The LM statistic, as noted earlier, is very large, so the hypothesis of no effects is rejected. > MATRIX > MATRIX
; br = b(1:2) ; vr = varb(1:2,1:2) $ ; db = bfbr ; vdb = vfvr ; List ; Hausman=db'db $ 1 +1 2.45500 > CALC ; List ; Ctb(.95,2) $ ++  Listed Calculator Results  ++ Result = 5.991465
The Hausman statistic is quite small, which suggests that the random effects approach is consistent with the data.
61
2. create ; logc=log(cost/pfuel) ; logp1=log(pmtl/pfuel) ; logp2=log(peqpt/pfuel) ; logp3=log(plabor/pfuel) ; logp4=log(pprop/pfuel) ; logp5=log(kprice/pfuel) ; logq=log(output) ; logq2=.5*logq^2 $ Namelist ; cd = logp1,logp2,logp3,logp4,logp5 $ create ; p11=.5* logp1^2 ; p22=.5* logp2^2 ; p33=.5* logp3^2 ; p44=.5* logp4^2 ; p55=.5* logp5^2 ; p12=logp1*logp2 ; p13=logp1*logp3 ; p14=logp1*logp4 ; p15=logp1*logp5 ; p23=logp2*logp3 ; p24=logp2*logp4 ; p25=logp2*logp5 ; p34=logp3*logp4 ; p35=logp3*logp5 ; p45=logp4*logp5 $ Namelist ; tl = p11,p12,p13,p14,p15,p22,p23,p24,p25,p33,p34,p35,p44,p45,p55$ Namelist ; z = loadfctr,stage,points $ regress;lhs=logc;rhs=one,logq,logq2,cd,z $ ++  Ordinary least squares regression   LHS=LOGC Mean = .7723984   Standard deviation = 1.074424   WTS=none Number of observs. = 256   Model size Parameters = 11   Degrees of freedom = 245   Residuals Sum of squares = 2.965806   Standard error of e = .1100242   Fit Rsquared = .9899249   Adjusted Rsquared = .9895136   Model test F[ 10, 245] (prob) =2407.23 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ Constant 20.3856176 22.8643711 .892 .3735 LOGQ  .95227889 .01832119 51.977 .0000 1.11237037 LOGQ2  .06568531 .01060839 6.192 .0000 1.45687077 LOGP1  .32662031 1.17956412 .277 .7821 .37999226 LOGP2  .28619766 .56614750 .506 .6136 .25308254 LOGP3  .16012937 .08634095 1.855 .0649 .66688211 LOGP4  .00519153 .07328859 .071 .9436 2.14504306 LOGP5  1.43718160 1.78896723 .803 .4225 12.6860637 LOADFCTR .94688632 .18441822 5.134 .0000 .54786115 STAGE  .00021794 .402227D04 5.418 .0000 507.879666 POINTS  .00199712 .00031682 6.304 .0000 72.9843750 ? ? Turns out the translog model cannot be computed with the firm ? dummy variables. I'll use the Cobb Douglas form. ? regress;lhs=logc;rhs= one,logq,logq2,cd ; panel ; pds=ti $ ++  OLS Without Group Dummy Variables   Ordinary least squares regression   LHS=LOGC Mean = .7723984   Standard deviation = 1.074424   WTS=none Number of observs. = 256 
62
 Model size Parameters = 8   Degrees of freedom = 248   Residuals Sum of squares = 4.190133   Standard error of e = .1299834   Fit Rsquared = .9857657   Adjusted Rsquared = .9853639   Model test F[ 7, 248] (prob) =2453.53 (.0000)  ++ ++  Panel Data Analysis of LOGC [ONE way]   Unconditional ANOVA (No regressors)   Source Variation Deg. Free. Mean Square   Between 272.013 24. 11.3339   Residual 22.3551 231. .967752E01   Total 294.368 255. 1.15439  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ LOGQ  .93708702 .01772733 52.861 .0000 1.11237037 LOGQ2  .07754607 .01211431 6.401 .0000 1.45687077 LOGP1  .94586281 1.38855410 .681 .4964 .37999226 LOGP2  .79081045 .66530892 1.189 .2357 .25308254 LOGP3  .01998606 .09963618 .201 .8412 .66688211 LOGP4  .08893118 .08543313 1.041 .2989 2.14504306 LOGP5  2.63118115 2.10504302 1.250 .2125 12.6860637 Constant 35.4178566 26.9017806 1.317 .1892 ++  Least Squares with Group Dummy Variables   Ordinary least squares regression   LHS=LOGC Mean = .7723984   Standard deviation = 1.074424   WTS=none Number of observs. = 256   Model size Parameters = 32   Degrees of freedom = 224   Residuals Sum of squares = .9373686   Standard error of e = .6468911E01   Fit Rsquared = .9968157   Adjusted Rsquared = .9963750   Model test F[ 31, 224] (prob) =2261.94 (.0000)  ++ ++  Panel:Groups Empty 0, Valid data 25   Smallest 2, Largest 15   Average group size 10.24  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ LOGQ  .66448665 .03580894 18.556 .0000 1.11237037 LOGQ2  .00955723 .01280811 .746 .4563 1.45687077 LOGP1  1.84750938 .76113884 2.427 .0159 .37999226 LOGP2  .73986763 .37612716 1.967 .0503 .25308254 LOGP3  .05323942 .06396335 .832 .4060 .66688211 LOGP4  .22763995 .04625120 4.922 .0000 2.14504306 LOGP5  1.83738098 1.16995945 1.570 .1176 12.6860637 ++  Test Statistics for the Classical Model  ++  Model LogLikelihood Sum of Squares Rsquared  (1) Constant term only 381.12407 .2943684435D+03 .0000000  (2) Group effects only 51.16832 .2235506489D+02 .9240575  (3) X  variables only 163.14470 .4190132631D+01 .9857657  (4) X and group effects 354.81332 .9373685874D+00 .9968157  ++  Hypothesis Tests   Likelihood Ratio Test F Tests 
63
 Chisquared d.f. Prob. F num. denom. P value  (2) vs (1) 659.911 24 .00000 117.116 24 231 .00000  (3) vs (1) 1088.538 7 .00000 2453.527 7 248 .00000  (4) vs (1) 1471.875 31 .00000 2261.945 31 224 .00000  (4) vs (2) 811.963 7 .00000 731.160 7 224 .00000  (4) vs (3) 383.337 24 .00000 32.388 24 224 .00000  ++ ++  Random Effects Model: v(i,t) = e(i,t) + u(i)   Estimates: Var[e] = .418468D02   Var[u] = .127110D01   Corr[v(i,t),v(i,s)] = .752323   Lagrange Multiplier Test vs. Model (3) = 479.37   ( 1 df, prob value = .000000)   (High values of LM favor FEM/REM over CR model.)   BaltagiLi form of LM Statistic = 174.85   Fixed vs. Random Effects (Hausman) = 40.99   ( 7 df, prob value = .000001)   (High (low) values of H favor FEM (REM).)   Sum of Squares .648771D+01   Rsquared .978056D+00  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ LOGQ  .79769706 .02494671 31.976 .0000 1.11237037 LOGQ2  .02011534 .01130089 1.780 .0751 1.45687077 LOGP1  1.11671466 .74579390 1.497 .1343 .37999226 LOGP2  .27128619 .36294718 .747 .4548 .25308254 LOGP3  .10761385 .06138583 1.753 .0796 .66688211 LOGP4  .18385724 .04550246 4.041 .0001 2.14504306 LOGP5  .49374865 1.13625272 .435 .6639 12.6860637 Constant 4.53328730 14.5229534 .312 .7549 regress;lhs=logc;rhs=z,one,logq,logq2,cd ; panel ; pds=ti $ ++  OLS Without Group Dummy Variables   Ordinary least squares regression   LHS=LOGC Mean = .7723984   Standard deviation = 1.074424   WTS=none Number of observs. = 256   Model size Parameters = 11   Degrees of freedom = 245   Residuals Sum of squares = 2.965806   Standard error of e = .1100242   Fit Rsquared = .9899249   Adjusted Rsquared = .9895136   Model test F[ 10, 245] (prob) =2407.23 (.0000)  ++ ++  Panel Data Analysis of LOGC [ONE way]   Unconditional ANOVA (No regressors)   Source Variation Deg. Free. Mean Square   Between 272.013 24. 11.3339   Residual 22.3551 231. .967752E01   Total 294.368 255. 1.15439  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ LOADFCTR .94688632 .18441823 5.134 .0000 .54786115 STAGE  .00021794 .402227D04 5.418 .0000 507.879666 POINTS  .00199712 .00031682 6.304 .0000 72.9843750 LOGQ  .95227889 .01832119 51.977 .0000 1.11237037 LOGQ2  .06568531 .01060839 6.192 .0000 1.45687077 LOGP1  .32662033 1.17956418 .277 .7821 .37999226 LOGP2  .28619767 .56614753 .506 .6136 .25308254 LOGP3  .16012937 .08634095 1.855 .0649 .66688211
64
LOGP4  .00519153 .07328859 .071 .9436 2.14504306 LOGP5  1.43718164 1.78896732 .803 .4225 12.6860637 Constant 20.3856181 22.8643723 .892 .3735 ++  Least Squares with Group Dummy Variables   Ordinary least squares regression   LHS=LOGC Mean = .7723984   Standard deviation = 1.074424   WTS=none Number of observs. = 256   Model size Parameters = 35   Degrees of freedom = 221   Residuals Sum of squares = .7726037   Standard error of e = .5912651E01   Fit Rsquared = .9973754   Adjusted Rsquared = .9969716   Model test F[ 34, 221] (prob) =2470.05 (.0000)  ++ ++  Panel:Groups Empty 0, Valid data 25   Smallest 2, Largest 15   Average group size 10.24  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ LOADFCTR .89457348 .14242570 6.281 .0000 .54786115 STAGE  .00022827 .894260D04 2.553 .0113 507.879666 POINTS  .00010341 .00041551 .249 .8037 72.9843750 LOGQ  .75278467 .03923479 19.187 .0000 1.11237037 LOGQ2  .00324835 .01306645 .249 .8039 1.45687077 LOGP1  1.38217070 .72421015 1.909 .0575 .37999226 LOGP2  .61609241 .35323609 1.744 .0824 .25308254 LOGP3  .00706546 .05918620 .119 .9051 .66688211 LOGP4  .14433953 .04404683 3.277 .0012 2.14504306 LOGP5  1.25331458 1.10477945 1.134 .2577 12.6860637 ++  Test Statistics for the Classical Model  ++  Model LogLikelihood Sum of Squares Rsquared  (1) Constant term only 381.12407 .2943684435D+03 .0000000  (2) Group effects only 51.16832 .2235506489D+02 .9240575  (3) X  variables only 207.37940 .2965806000D+01 .9899249  (4) X and group effects 379.55705 .7726036853D+00 .9973754  ++  Hypothesis Tests   Likelihood Ratio Test F Tests   Chisquared d.f. Prob. F num. denom. P value  (2) vs (1) 659.911 24 .00000 117.116 24 231 .00000  (3) vs (1) 1177.007 10 .00000 2407.226 10 245 .00000  (4) vs (1) 1521.362 34 .00000 2470.054 34 221 .00000  (4) vs (2) 861.451 10 .00000 617.357 10 221 .00000  (4) vs (3) 344.355 24 .00000 26.140 24 221 .00000  ++ ++  Random Effects Model: v(i,t) = e(i,t) + u(i)   Estimates: Var[e] = .349594D02   Var[u] = .860939D02   Corr[v(i,t),v(i,s)] = .711206   Lagrange Multiplier Test vs. Model (3) = 466.36   ( 1 df, prob value = .000000)   (High values of LM favor FEM/REM over CR model.)   BaltagiLi form of LM Statistic = 170.10   Fixed vs. Random Effects (Hausman) = 44.65   (10 df, prob value = .000003)   (High (low) values of H favor FEM (REM).)   Sum of Squares .451094D+01 
65
 Rsquared .984812D+00  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ LOADFCTR 1.07921018 .13264921 8.136 .0000 .54786115 STAGE  .00016415 .672354D04 2.441 .0146 507.879666 POINTS  .00044792 .00035950 1.246 .2128 72.9843750 LOGQ  .86611837 .02783747 31.113 .0000 1.11237037 LOGQ2  .02222380 .01102947 2.015 .0439 1.45687077 LOGP1  .92719911 .70150544 1.322 .1863 .37999226 LOGP2  .30782803 .33937387 .907 .3644 .25308254 LOGP3  .02581955 .05671735 .455 .6489 .66688211 LOGP4  .09284095 .04277517 2.170 .0300 2.14504306 LOGP5  .36595849 1.06514141 .344 .7312 12.6860637 Constant 2.36774378 13.6315073 .174 .8621 matrix ; List ; bz=b(1:3);vz=varb(1:3,1:3) ; wald = bz'bz $ Matrix WALD has 1 rows and 1 columns. 1 +1 74.33957
66
Chapter 10 ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Systems of Regression Equations ⎡ y1 ⎤ ⎡ i ⎤ ⎡ε1 ⎤ 1. The model can be written as ⎢ ⎥ = ⎢ ⎥μ + ⎢ ⎥ . Therefore, the OLS estimator is ⎣ y 2 ⎦ ⎣i ⎦ ⎣ε 2 ⎦ 1 m = (i′i + i′i) (i′y1 + i′y2) = (n y 1 + n y 2 ) / (n + n) = ( y 1 + y 2 )/2 = 1.5.
The sampling variance would be Var[m] = (1/2)2{Var[ y 1 ] + Var[ y 2 ] + 2Cov[( y 1 1, y 2 )]}. We would estimate the parts with
Est.Var[ y 1 ]
= s11/n = ((150  100(1)2)/99)/100 = .0051
Est.Var[ y 2 ]
= s22/n = ((550  100(2)2)/99)/100 = .0152
Est.Cov[ y 1 , y 2 ] = s12/n = ((260  100(1)(2))/99)/100 = .0061 Combining terms, Est.Var[m] = .0079. The GLS estimator would be [(σ11 + σ12)i′y1 + (σ22 + σ12)i′y2]/[(σ11 + σ12)i′i + (σ22 + σ12)i′i] = w y 1 + (1w) y 2 ⎡ σ11 σ12 ⎤ 1 ⎡ σ 22 − σ12 ⎤ 1 where w = (σ11 + σ12) / (σ11 + σ22 + 2σ12). Denoting Σ = ⎢ ,Σ = ⎥ ⎢ ⎥. 2 σ 11σ 22 − σ 12 ⎣− σ12 σ11 ⎦ ⎣σ12 σ 22 ⎦ The weight simplifies a bit as the determinant appears in both the denominator and the numerator. Thus, w = (σ22  σ12) / (σ11 + σ22  2σ12). For our sample data, the two step estimator would be based on the variances computed above and s11 = .5051, s22 = 1.5152, s12 = .6061. Then, w = 1.1250. The FGLS estimate is 1.125(1) + (1  1.125)(2) = .875. The sampling variance of this estimator is w2Var[ y 1 ] + (1  w)2Var[ y 2 ] + 2w(1  w)Cov[ y 1 , y 2 ] = .0050 as compared to .0079 for the OLS estimator. ⎡ y1 ⎤ ⎡ i 0 ⎤ ⎛ β1 ⎞ ⎡ ε 1 ⎤ ⎡ σ11I σ12 I ⎤ 2 2. The model is y = ⎢ ⎥ = Xβ + ε = ⎢ ⎜ ⎟+⎢ ⎥, σΩ = ⎢ ⎥ ⎥. ⎣y2 ⎦ ⎣0 x ⎦ ⎝ β 2 ⎠ ⎣ ε 2 ⎦ ⎣σ12 I σ 22 I⎦
The generalized least squares estimator is −1
⎡ σ11i' i σ12 i' x ⎤ ⎛ σ 11i' y1 + σ 12 i' y 2 ⎞ ⎟ β = [ X' Ω X] X' Ω y = ⎢ 12 ⎥ ⎜ 12 22 22 ⎢⎣σ i' x σ x' x⎥⎦ ⎝ σ x' y1 + σ x' y 2 ⎠ ∧
−1
−1
−1
−1
⎡ ⎛ σ11 σ12 x ⎞ ⎤ ⎡ ⎛ σ 11 y 1 + σ 12 y 2 ⎞ ⎤ ⎟ ⎥ ⎢n⎜ 12 ⎟⎥ = ⎢n⎜ 12 22 22 ⎢⎣ ⎝ σ x σ sxx ⎠ ⎥⎦ ⎢⎣ ⎝ σ sx1 + σ sx 2 ⎠ ⎥⎦ where sxx = x′x/n, sx1 = x′y1/n, sx2 = x′y2/n and σij = the ijth element of the 2×2 Σ1. To obtain the explicit form, note, first, that all terms σij are of the form σji/(σ11σ22  σ212) But, the denominator in these ratios will be cancelled as it appears in both the inverse matrix and in the vector. Therefore, in terms of the original parameters, (after cancelling n), we obtain −1
⎡ σ ⎡σ11sxx − σ12 x ⎤ ⎡ σ 22 y1 − σ 12 y 2 ⎤ 1 β = ⎢ 22 ⎥ = ⎥ ⎢ 12 2 ⎢ σ 11σ 22 sxx − (σ12 x ) ⎣ σ12 x ⎣− σ12 x σ11sxx ⎦ ⎢⎣− σ sx1 + σ11sx 2 ⎥⎦ ∧
σ12 x ⎤⎛ σ 22 y 1 − σ12 y 2 ⎞ ⎟. ⎥⎜ σ 22 ⎦⎝ − σ12 sx1 + σ11sx 2 ⎠
∧
The two elements are
β1 = [σ11sxx(σ22 y 1  σ12 y 2 )  σ12 x (σ12sx1 σ11sx2)]/[σ11σ22sxx  (σ12 x )2] ∧
β 2 = [σ12 x (σ22 y 1  σ12 y 2 )  σ22(σ12sx1  σ11sx2)]/[σ11σ22sxx  (σ12 x )2]
The asymptotic covariance matrix is
67
⎡ ⎛ σ11 σ12 x ⎞ ⎤ ⎟⎥ [X′Ω X] = ⎢n⎜ 12 22 ⎢⎣ ⎝ σ x σ sxx ⎠ ⎥⎦ 1
1
−1
⎡ ⎛ σ 22 − σ12 x⎞ ⎤ n ⎜ ⎟⎥ =⎢ 2 ⎢⎣ σ11σ 22 − σ12 ⎝ − σ12 x σ11sxx ⎠ ⎥⎦
−1
⎛ y1 ⎞ ⎟ . The sampling variance is The OLS estimator is b = (X′X)1X′y = ⎜ ⎝ x' y / x' x⎠ −1
−1
0 ⎤ ⎡ σ11n 0 ⎤ ⎡n σ12 nx ⎤ ⎡n (X′X) X′ΩX(X′X) = ⎢ ⎥⎢ ⎢ ⎥ . The ns are carried outside the product ⎥ ⎣ 0 nsxx ⎦ ⎣σ12 nx σ 22 nsxx ⎦ ⎣0 nsxx ⎦ ⎡ σ11 / n σ12 x / ( nsxx ) ⎤ and reduce to (1/n). This leaves Var[b] = ⎢ . 2⎥ ⎣σ 12 x / ( nsxx ) σ 22 / ( nsxx ) ⎦ Using the results above, the OLS coefficients are b1 = y 1 = 150/50 = 3 and b2 = x′y2/x′x = 50/100 = 1/2. The estimators of the disturbance (co)variances are s11 = Σi (yi1  y 1 )2/n = (500  50(3)2)/50 = 1 1
1
s22 = Σi (yi2  b2xi)2/n = (90  (1/2)50)/50 = 1.3 s12 = Σi (yi1  y 1 )(yi2  b2xi)2/n = [y1′y2  n y 1 y 2  b2x′y1 + nb2 y 1 x ]/n = (40  50(3)(1)  (1/2)60 + 50(1/2)(3)(2)/50 = .2 Therefore, we estimate the asymptotic covariance matrix of the OLS estimates as 1 / 50 .2( 2)[50( 90)]⎤ ⎡ .02 .0000888⎤ ⎡ . Est.Var[b] = ⎢ =⎢ ⎥ 13 . / 90 ⎦ ⎣.0000888 .01444 ⎥⎦ ⎣.2( 2)[50 / 90]
To compute the FGLS estimates, we use our results from part a. The necessary statistics for the s11 = .2, sxx = 100/50 = 2, x = 100/50 = 2, computation are s11 = 1, s22 = 1.3, y 1 = 150/50 = 3, y 2 = 50/50 = 1 sx1 = 60/50 = 1.2, sx2 = 50/50 = 1 ∧
Then,
β1 = {1(2)[1.3(3)  .2(1)]  .2(2)[.2(1.2)  1(1)]}/{1(1.3)  [.2(2)]2} = 3.157 ∧
β 2 = {2(2)[1.3(3)  .2(1)]  1.3[.2(1.2)  1(1)]}/{1(1.3)  [.2(2)]2} = 1.011
The estimate of the asymptotic covariance matrix is ⎡ 1( 2) .2( 2) ⎤ ⎡.020656 .004131⎤ (1/50)[1(1.3)  (.2)2]/{1(1.3)2  [.2(2)]2} ⎢ . Notice that the = . ⎥⎦ ⎢⎣.004131 .007945⎥⎦ ⎣.2( 2) 13 estimated variance of the FGLS estimator of the parameter of the first equation is larger. The result for the true GLS estimator based on known values of the disturbance variances and covariance does not guarantee that the estimated variances will be smaller in a finite sample. However, the estimated variance of the second parameter is considerably smaller than that for the OLS estimate. Finally, to test the hypothesis that β2 = 1 we use the zstatistic (asymptotically distributed as standard normal), z = (1.011  1) / (.007945)2 = .123. The hypothesis cannot be rejected.
3. The ordinary least squares estimates of the parameters are b1 = x1′y1/x1′x1 = 4/5 = .8 and b2 = x2′y2/x2′x2 = 6/10 = .6 Then, the variances and covariance of the disturbances are s11 = (y1′y1  b1x1′y1)/n = (20  .8(4))/20 = .84 s22 = (y2′y2  b2x2′y2)/n = (10  .6(6))/20 = .32 s12 = (y1′y2  b2x2′y1  b1x1′y2 + b1b2x1′x2 )/n = (6  .6(3)  .8(3) + .8(.6)(2))/20 = .246
68
1
We will require S ⎛ ∧ ⎞ ⎡ 11 ⎜ β1 ⎟ = s x1 ' x1 ⎜ ∧ ⎟ ⎢⎢ s12 x ' x 1 2 ⎝ β2 ⎠ ⎣
⎡ .84 .246⎤ = ⎢ ⎥ ⎣.246 .32 ⎦
−1
⎡ s11 = ⎢ 12 ⎢⎣ s
12
⎤
11 ⎥
s ⎥⎦
. Then, the FGLS estimator is
−1
s12 x1 ' x 2 ⎤ ⎡ s11x1 ' y1 + s12 x1 ' y 2 ⎤ ⎥ ⎢ ⎥ . Inserting the values given in the problem produces s 22 x 2 ' x 2 ⎥⎦ ⎢⎣ s12 x 2 ' y1 + s 22 x 2 ' y 2 ⎥⎦ ∧
∧
the FGLS estimates, β1 = .505335, β 2 = .541741 with estimated asymptotic covariance matrix equal to the ⎡ ∧ ⎤ ⎡ .132565 .0077645⎤ inverse matrix shown above, Est.Var ⎢β ⎥ = ⎢ ⎥ . To test the hypothesis, we use the t ⎣ ⎦ ⎣.0077645 .0252505⎦ statistic, t = (.505335  .541741)/[.132565 + .0252505  2(.0077645)]2 = .0965 which is quite small. We would not reject the hypothesis. To compute the maximum likelihood estimates, we would begin with the OLS estimates of σ11, σ22, and σ12. Then, we iterate between the following calculations (1) Compute the 2×2 matrix, S1 ⎡ s11x ' x s12 x1 ' x 2 ⎤ (2) Compute the 2×2 matrix [X′(S1⊗I)X] = ⎢ 12 1 1 ⎥ 22 ⎢⎣ s x1 ' x 2 s x 2 ' x 2 ⎥⎦ ⎡ s11x ' y + s12 x1 ' y 2 ⎤ [X′(S1⊗I)y] = ⎢ 12 1 1 ⎥ 22 ⎢⎣ s x 2 ' y1 + s x 2 ' y 2 ⎥⎦ ∧
(3) Compute the coefficient vector β = [X′(S1⊗I)X]1[X′(S1⊗I)y] Compare this estimate to the previous one. If they are similar enough, exit the iterations. ∧
∧
∧ ∧
(4) Recompute S using sij = yi′yj  βi xi′yj  β j xj′yi + βi β j xi′xj, i,j = 1,2. (5) Go back to step (1) and continue. Our iterations produce the two slope estimates 1: .505335 .541741 2: .601889 .564998 3: .614884 .566875 4: .616559 .567186 5: .616775 .567227 6: .616803 .567232 7: .616807 .567232 converged. At convergence, we find the estimate of the asymptotic covariance matrix of the estimates as ⎡ .155355 .00576887⎤ ⎡.8483899 .1573814 ⎤ [XN(S1⊗I)X]1 = ⎢ and S = ⎢ ⎥ ⎥. ⎣.00576887 .029348 ⎦ ⎣.1573814 .3205369⎦ To use the likelihood ratio method to test the hypothesis, we will require the restricted maximum likelihood estimate. Under the hypothesis,the model is the one in Section 15.2.2. The restricted estimate is given in (1512) and the equations which follow. To obtain them, we make a small modification in our algorithm above. We replace step (3) with ∧
(3') β = [s11x1′y1 + s22x2′y2 + s12(x1′y2 + x2′y1)]/[s11x1′x1 + s22x2′x2 + 2s12x1′x2]. ∧
∧
Step 4 is then computed using this common estimate for both β1 and β 2 . The iterations produce 1: 2: 3: 4: 5: 6:
.5372671 .5703837 .5725274 .5726687 .5726780 .5726786 converged.
69
⎡.8529188 .1609926 ⎤ At this estimate, the estimate of Σ is ⎢ ⎥. The likelihood ratio statistic is given in (1556). ⎣.1609926 .3203732⎦ Using our unconstrained and constrained estimates, we find Wu = .2471714 and Wr = .2473338. The statistic is λ = 20(ln.2473338  ln.2471714) = .0131. This is far below the critical value of 3.84, so once again, we do not reject the hypothesis.
4. The GLS estimator is −1
⎡ σ11X' X σ12 X' X ⎤ ⎡ σ11X' y1 + σ12 X' y 2 ⎤ β = ⎢ 12 ⎥ ⎢ 12 ⎥ 22 22 ⎢⎣σ X' X σ X' X⎥⎦ ⎢⎣σ X' y1 + σ X' y 2 ⎥⎦ The matrix to be inverted equals [Σ1 ⊗X′X]1. But, [Σ1⊗X′X]1 = Σ⊗(X′X)1. (See (276).) Therefore, ∧
−1
⎡ σ (X' X) 1 σ12 (X' X) 1 ⎤ ⎡ σ11X' y1 + σ12 X' y 2 ⎤ β = ⎢ 11 ⎥ ⎢ ⎥ 1 σ 22 (X' X) 1 ⎥⎦ ⎢⎣σ12 X' y1 + σ 22 X' y 2 ⎥⎦ ⎢⎣σ12 (X' X) We now make the replacements X′y1 = (X′X)b1 and X′y2 = (X′X)b2. After multiplying out the product, we find that ∧ ⎡ σ σ 11b + σ11σ12b 2 + σ 12 σ12b1 + σ12 σ 22b 2 ⎤ ⎡ (σ11σ 11 + σ12 σ12 )b1 + ( σ11σ12 + σ12 σ 22 )b 2 ⎤ β = ⎢ 11 11 1 ⎥ ⎥=⎢ 12 12 22 11 12 12 22 ⎢⎣σ 12 σ b1 + σ 12 σ b 2 + σ 22 σ b1 + σ 22 σ b 2 ⎥⎦ ⎢⎣( σ12 σ + σ 22 σ )b1 + ( σ12 σ + σ 22 σ )b 2 ⎥⎦ ∧
∧ ⎛ b1 ⎞ The four scalar terms in the matrix product are the corresponding elements of ΣΣ1 = I. Therefore, β = ⎜ ⎟ . ⎝ b2 ⎠
5. The algebraic result is a little tedious, but straightforward. The GLS estimator which is computed is −1 ⎛ ∧ ⎞ ⎡ 11 12 11 12 ⎜ β1 ⎟ = σ x1 ' x1 σ x1 ' x 2 ⎤ ⎡ σ x1 ' y1 + σ x1 ' y 2 ⎤ . ⎥ ⎢ ⎢ ⎜ ∧ ⎟ ⎢σ12 x ' x σ 22 x ' x ⎥ ⎢σ12 x ' y + σ 22 x ' y ⎥⎥ 2 1 2 2⎦ ⎣ 2 1 2 2⎦ ⎝β ⎠ ⎣ 2
It helps at this point to make some simplifying substitutions. The elements in the inverse matrix, σij, are all equal to elements of the original matrix divided by the determinant. But, the determinant appears in the leading matrix, which is inverted and in the trailing vector (which is not). Therefore, the determinant will −1 ⎛ ∧⎞ ⎡ σ 22 x1 ' x1 − σ12 x1 ' x 2 ⎤ ⎡ σ 22 x1 ' y1 − σ12 x1 ' y 2 ⎤ β cancel out. Making the substitutions, ⎜ ∧1 ⎟ = ⎢ . Now, ⎜ ⎟ ⎣ − σ12 x 2 ' x1 σ11x 2 ' x 2 ⎥⎦ ⎢⎣− σ12 x 2 ' y1 + σ 22 x 2 ' y 2 ⎥⎦ ⎝ β2 ⎠ we are concerned with probability limits. We divide every element of the matrix to be inverted by n, then because of the inversion, divide the vector on the right by n as well. Suppose, for simplicity, that −1 ⎛ ∧⎞ ⎡ σ 22 x1 ' y1 / n − σ12 x1 ' y 2 / n ⎤ ⎡ σ 22 q11 − σ12 q12 ⎤ β limn→∞xi′xj/n = qij, i,j = 1,2,3. Then, plim ⎜ ∧1 ⎟ = ⎢ plim⎢ ⎥ ⎜ ⎟ ⎣− σ12 q12 σ11q22 ⎥⎦ ⎣− σ12 x 2 ' y1 / n + σ11x 2 ' y 2 / n ⎦ ⎝β ⎠ 2
Then, we will use plim (1/n)x1′y1 = β1q11 + plim (1/n)x1Nε1 = β1q11 plim (1/n)x1′y2 = β2q12 + β3q13 plim (1/n)x2′y1 = β1q12 plim (1/n)x2′y2 = β2q22 + β3q23. Therefore, after multiplying out all the terms, −1 ⎛ ∧⎞ ⎡ σ 22 q11 − σ12 q12 ⎤ ⎡ β1σ 22 q11 − β 2 σ12 q12 − β 3σ12 q13 ⎤ β plim ⎜ ∧1 ⎟ = ⎢ . ⎜ ⎟ ⎣ − σ12 q12 σ11q22 ⎥⎦ ⎢⎣ − β1σ12 q12 + β 2 σ11q22 + β 3σ11q23 ⎥⎦ ⎝β ⎠ 2
The inverse matrix is
1 σ 11σ 22 q11q22
⎡σ11q22 ⎢ − ( σ12 q12 ) ⎣σ12 q12 2
σ12 q12 ⎤ , so with Δ = (σ11F22q11q22  (F12q12)2) σ 22 q22 ⎥⎦
70
⎛ βˆ 1 ⎞ ⎡ 1 ⎛ σ11q22 plim ⎜ ⎟ = ⎢ ⎜ ⎜ βˆ ⎟ ⎣ Δ ⎝ σ12 q12 ⎝ 2⎠
−1
σ12 q12 ⎞ ⎤ ⎡ β1σ22 q11 − β2 σ12 q12 − β3σ12 q13 ⎤ . ⎟⎥ σ 22 q11 ⎠ ⎦ ⎢⎣ −β1σ12 q12 + β2 σ11q22 + β3σ11q23 ⎥⎦
Taking the first coefficient
separately and collecting terms, ∧
plim β1 = β1[σ11σ22q11q22(σ12q12)2]/Δ + β2[σ11q22σ12q12 + σ12q12σ11q22]/Δ + β3[σ11q22σ12q13 + σ12q12σ11q23]/Δ The first term in brackets equals Δ while the second equals 0. That leaves ∧
plim β1 = β1  β3[σ11σ12(q22q13  q12q23)]/Δ which is not equal to β1. There are two special cases worthy of note, though. The right hand side does equal β1 if either (1) σ12 = 0; the regressions are actually unrelated, or (2) q12 = q13 = 0; the regressors in the two equations are uncorrelated. The second of these is similar to our finding for omitted variables in the classical regression model. ⎛ α1 ⎞ ⎡ y1 ⎤ ⎡ i x 0 ⎤ ⎜ ⎟ ⎡ ε 1 ⎤ β + 6. The model is ⎢ ⎥ = ⎢0 . The GLS estimator of the full coefficient vector, θ, is i ⎥⎦⎜⎜ ⎟⎟ ⎢⎣ε 2 ⎥⎦ ⎣y2 ⎦ ⎣ 0 ⎝α2⎠ ⎡ 11 ⎛ n nx ⎞ ⎟ ⎢σ ⎜ θ = ⎢ ⎝ nx x' x⎠ ⎢ σ12 n nx ⎢⎣ ∧
(
)
n ⎞⎤ σ ⎜ ⎟⎥ ⎝ nx⎠ ⎥ 22 σ n ⎥⎥ ⎦ 12 ⎛
−1
⎡ 11 ⎛ ny ⎞ 12 ⎛ ny ⎞ ⎢σ ⎜ 1 ⎟ + σ ⎜ 2 ⎟ ⎝ x' y 2 ⎠ ⎢ ⎝ x' y1 ⎠ ⎢ σ12 n y + σ 22 n y 1 2 ⎣
⎤ ⎥ ⎥ . Let qxx equal x′x/n, qx1 = x′y1/n and, qx2 = ⎥ ⎦
x′y2/n. The ns in the inverse and in the vector cancel. Also, as suggested, we assume that x = 0. As in the previous exercise, we replace elements of the inverse with elements from the original matrix and cancel the determinant which multiplies the matrix (after inversion) and divides the vector. Thus, ⎡ σ11 θ = ⎢⎢ 0 ⎢⎣− σ12 ∧
0 σ 22 q xx 0
− σ12 ⎤ 0 ⎥⎥ σ11 ⎥⎦
−1
⎡ σ 22 y1 − σ12 y 2 ⎤ ⎢ ⎥ ⎢ σ11q x1 − σ12 q x 2 ⎥. The inverse of the matrix is straightforward. Proceeding ⎢− σ y + σ y ⎥ 11 2 ⎦ ⎣ 12 1
0 ⎡ σ11σ 22 q xx 1 ⎢ 2 σ σ 0 directly, we obtain θ = 11 22 − σ12 2 ⎢ σ 22 q xx (σ11σ 22 − σ12 ) ⎢⎣σ12 σ 22 q xx 0 It remains only to multiply the matrices and collect terms. The result is ∧
∧
∧
σ12 σ 22 q xx ⎤ ⎥ 0 ⎥ σ 22 q xx ⎥⎦
−1
⎡ σ 22 y1 − σ12 y 2 ⎤ ⎢ ⎥ ⎢ σ11q x1 − σ12 q x 2 ⎥. ⎢− σ y + σ y ⎥ 11 2 ⎦ ⎣ 12 1
∧
α 1 = y 1 , α 2 = y 2 , β = [(qx1/qxx)  (σ12σ22)(qx2/qxx)] = b1  γb2.
7. Once again, nothing is lost by assuming that x = 0. Now, the OLS estimators are a1 = y 1 , a2 = y 2 , a3 = y 3 , b = x′y1/x′x. The vector of residuals is ei1 = yi1  y 1  bxi ei2 = yi2  y 2 ei3 = yi3  y 3
Now, if yi2 + yi3 = 1 at every observation, then (1/n)Σi(yi2 + yi3) = y 2 + y 3 = 1 as well. Therefore, by just adding the two equations, we see that ei2 + ei3 = 0 for every observation. Let ei be the 3×1 vector of residuals. Then, ei′c = 0, where c = [0,1,1]′. The sample covariance matrix of the residuals is S = [(1/n)Σi eiei′]. Then, Sc = [(1/n)Σi eiei′]c = [(1/n)Σi eiei′c] = [(1/n)Σi ei×0] = 0, which means, by definition, that S is singular. We can proceed simply by dropping the third equation. The adding up condition implies that α3 = 1  α2. So, we can treat the first two equations as a seemingly unrelated regression model and estimate a3 using the estimate of α2.
71
Applications 1. By adding the share equations vertically, we find the restrictions β1 + β2 + β3 = 1 δ11 + δ12 + δ13 = 0 δ12 + δ22 + δ23 = 0 δ13 + δ23 + δ33 = 0 γy1 + γy2 + γy3 = 0. Note that the adding up condition also implies ε1 + ε2 + ε3 = 0. We will eliminate the third share equation. The restrictions imply β3 = 1  β1  β2 δ13 =  δ11  δ12 δ23 =  δ12  δ22 δ33 =  δ13  δ23 = δ11 + δ22 + 2δ12 γy3 =  γy1  γy2. By inserting these in the three share equations, we find S1 = β1 + δ11lnp1 + δ12lnp2  δ11lnp3  δ12lnp3 + γy1lnY + ε1 = β1 + δ11ln(p1/p3) + δ12ln(p2/p3) + γy1lnY + ε1 S2 = β2 + δ12lnp1 + δ22lnp2  δ12lnp3  δ22lnp3 + γy2lnY + ε2 = β2 + δ12ln(p1/p3) + δ22ln(p2/p3) + γy2lnY + ε2 S3 = 1  β1  β2  δ11lnp1  δ12lnp1  δ12lnp2  δ22lnp2 + δ11lnp3 + δ12lnp3 + δ12lnp3 + δ22lnp3  γy1lnp3  γy2lnp3  ε1  ε2 = 1  S1  S2 For the cost function, making the substitutions for β3, δ13, δ23, δ33, and γy3 produces lnC = α + β1(lnp1  lnp3) + β2(lnp2  lnp3) + δ11((ln2p1)/2  lnp1lnp3 + (ln2p3)/2) + δ22((ln2p2)/2  lnp2lnp3 + (ln2p3)/2) + δ12(lnp1lnp2  lnp1lnp3  lnp2lnp3 + (ln2p3)) + γy1lnY(lnp1  lnp3) + γy2lnY(lnp2  lnp3) + βylnY + βyy(ln2Y)/2 + εc = α + β1ln(p1/p3) + β2ln(p2/p3) + δ11(ln2(p1/p3))/2 + δ22(ln2(p2/p3))/2 + δ12ln(p1/p3)ln(p2/p3) + γy1lnYln(p1/p3) + γy2lnYln(p2/p3) + βylnY + βyy(ln2Y)/2 + εc The system of three equations (cost and two shares) can be estimated as discussed in the text. Invariance is achieved by using a maximum likelihood estimator. The five parameters eliminated by the restrictions can be estimated after the others are obtained just by using the restrictions. The restrictions are linear, so the standard errors are also striaghtforward to obtain. The least squares estimates are shown below. Estimated standard errors appear in parentheses. Variable One ln(pk/pf) ln(pl/pf) ln2(pk/pf)/2 ln2(pl/pf)/2 ln(pk/pf)ln(pl/pf) lnY ln2Y/2 lnYln(pk/pf) lnYln(pl/pf)
Cost Function Capital Share 51.32 (45.91) .0174 (.4697) 21.74 (20.14) .2380 (.1045) 32.39 (21.81) .0065 (.1059) 4.596 (4.604) .0007 (.0098) 8.216 (5.159) 6.238 (4.684) 1.674 (.9297) ,006997 (.0313) .3223 (.2652) .08631 (.1981)
Labor .2172 .0033 .0168 .0117
Share (.2408) (.0534) (.0542) (.0050)
The estimates do not even come close to satisfying the cross equation restrictions. The parameters in the cost function are extremely large, owing primarily to rather severe multicollinearity among the price terms. The results of estimation of the system by direct maximum likelihood are shown. The convergence criterion is the value of Belsley (discussed near the end of Section 5.5). The value α shown below is g′H1g where g is the gradient and H is the Hessian of the loglikelihood. Iteration
0, F=46.76391, ln*S*= 7.514268, α= 2.054399
72
Iteration 1, F=136.7448, ln*S*= 16.51236, α= .5796486 Iteration 2, F=146.9803, ln*S*= 17.53591, α= .02179947 Iteration 3, F=147.2268, ln*S*= 17.56055, α= .0004222 Residual covariance matrix Cost Capital Labor Cost .0145572 Capital .000304768 .00303853 Labor .000317554 .000887258 .000798128 Coefficient Estimate Std. Error α 6.41878 .6637 .0546555 .2422 βk βl .250976 .2138 δkk .245259 .06904 δll .0245770 .04788 δkl .00403448 .04779 βy .572452 .1340 .0456587 .01908 βyy γyk .00124236 .008409 .0116921 .004442 γyl βf .8036795 δkf .2412245 δlf .0205425 δff .261767 γyf .0129345
The means of the variables are: Y = 3531.8,
p k = 169.35, p l = 2.039,
p f = 26.41. The
three factor shares computed at these means are Sk = .4182, Sl = .0865, Sf = .4953. (The sample means are .411, .0954, and .4936.) The matrix of elasticities computed according to (1572) is k l f .01115 k Σ = .8885 7.2756 l .1646 .5206 .04819 f (Two of the three diagonals have the `wrong' sign. This may be due to the very small sample size. The cross elasticities however do conform to what one might expect, the primary one being the evident substitution between capital and fuel. To test the hypothesis that γyi = 0, we reestimate the model without the interaction terms between lnY and the prices in the cost function and without lnY in the factor share equations. The iterations for this restricted model are shown below. Iter.= 0, F=46.76391, logS= 7.514268, α= 1.912223 Iter.= 1, F=123.7521, logS= 15.21308, α= .5888180 Iter.= 2, F=136.3410, logS= 16.47198, α= .2771995 Iter.= 3, F=141.3491, logS= 16.97279, α= .08024513 Iter.= 4, F=142.5591, logS= 17.09379, α= .01636212 Converged achieved Since we are interested only in the test statistic, we have not listed the parameter estimates. The test statistic given in (1726) is λ = T(lnSr  lnSu) = 20(17.09379  (17.56055)) = 9.3352. There are two restrictions since only two of the three parameters are free. The critical value from the chisquared table is 5.99, so we would reject the hypothesis.
73
?=========================================== ? Application 10.2 ?=========================================== ? a. Separate regressions and aggregation test. ? This saves the residuals to be used later. CALC ; SS1=0 $ MATRIX ; EOLS = Init(20,10,0) $ PROCEDURE $ Include ; new ; Firm = company $ REGRESS ; Lhs = I ; Rhs = F,C,one ; Res = e$ CALC ; SS1=SS1 + Sumsqdev $ MATRIX ; EOLS(*,company) = e $ ENDPROC $ EXECUTE ; Company=1,10 $ SAMPLE ; 1200 $ ++  Residuals Sum of squares = 143205.9   Standard error of e = 91.78167   Fit Rsquared = .9213540  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ F  .11928083 .02583417 4.617 .0002 4333.84500 C  .37144481 .03707282 10.019 .0000 648.435000 Constant 149.782453 105.842125 1.415 .1751 ++  Residuals Sum of squares = 158093.3   Standard error of e = 96.43445   Fit Rsquared = .4708624  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ F  .17485602 .07419805 2.357 .0307 1971.82500 C  .38964189 .14236688 2.737 .0140 294.855000 Constant 49.1983219 148.075365 .332 .7438 ++  Residuals Sum of squares = 13216.59   Standard error of e = 27.88272   Fit Rsquared = .7053067   Adjusted Rsquared = .6706369  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ F  .02655119 .01556610 1.706 .1063 1941.32500 C  .15169387 .02570408 5.902 .0000 400.160000 Constant 9.95630645 31.3742491 .317 .7548 ++  Residuals Sum of squares = 2997.444   Standard error of e = 13.27856   Fit Rsquared = .9135784  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ F  .07794782 .01997330 3.903 .0011 693.210000 C  .31571819 .02881317 10.957 .0000 121.245000 Constant 6.18996051 13.5064781 .458 .6525 ++  Residuals Sum of squares = 1396.836   Standard error of e = 9.064592   Fit Rsquared = .6804076  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X
74
+++++++ F  .16237770 .05703645 2.847 .0111 231.470000 C  .00310174 .02196531 .141 .8894 486.765000 Constant 22.7071160 6.87207605 3.304 .0042 ++  Residuals Sum of squares = 1110.533   Standard error of e = 8.082418   Fit Rsquared = .9521422  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ F  .13145484 .03117234 4.217 .0006 419.865000 C  .08537427 .10030597 .851 .4065 104.285000 Constant 8.68554338 4.54516804 1.911 .0730 ++  Residuals Sum of squares = 1507.403   Standard error of e = 9.416516   Fit Rsquared = .7635009  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ F  .08752720 .06562593 1.334 .1999 149.790000 C  .12378141 .01706483 7.254 .0000 314.945000 Constant 4.49953436 11.2893942 .399 .6952 ++  Residuals Sum of squares = 1773.234   Standard error of e = 10.21312   Fit Rsquared = .7444461  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ F  .05289413 .01570650 3.368 .0037 670.910000 C  .09240649 .05609897 1.647 .1179 85.6400000 Constant .50939018 8.01528894 .064 .9501 ++  Residuals Sum of squares = 1407.360   Standard error of e = 9.098674   Fit Rsquared = .6655145  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ F  .07538794 .03395227 2.220 .0403 333.650000 C  .08210356 .02799168 2.933 .0093 297.900000 Constant 7.72283708 9.35933952 .825 .4207 ++  Residuals Sum of squares = 20.02673   Standard error of e = 1.085377   Fit Rsquared = .6431578  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ F  .00457343 .02716079 .168 .8683 70.9210000 C  .43736919 .07958891 5.495 .0000 5.94150000 Constant .16151857 2.06556414 .078 .9386
75
++  Ordinary least squares regression   LHS=I Mean = 145.9582   Standard deviation = 216.8753   WTS=none Number of observs. = 200   Model size Parameters = 3   Degrees of freedom = 197   Residuals Sum of squares = 1755850.   Standard error of e = 94.40840   Fit Rsquared = .8124080   Adjusted Rsquared = .8105035   Model test F[ 2, 197] (prob) = 426.58 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ F  .11556216 .00583571 19.803 .0000 1081.68110 C  .23067849 .02547580 9.055 .0000 276.017150 Constant 42.7143694 9.51167603 4.491 .0000 ? b. Aggregation test REGRESS ; LHS = I ; RHS = F,C,one $ CALC ; SS0=Sumsqdev $ CALC ; List ; Fstat = ((SS0  SS1)/(9*3)) / (SS0/(n10*3)) ; FC = Ftb(.95,27,170) $ ++  Listed Calculator Results  ++ FSTAT = 5.131854 FC = 1.551534 ? c. SUR model NAMELIST ; X1=F1,C1,one $ NAMELIST ; X2=F2,C2,one $ NAMELIST ; X3=F3,C3,one $ NAMELIST ; X4=F4,C4,one $ NAMELIST ; X5=F5,C5,one $ NAMELIST ; X6=F6,C6,one $ NAMELIST ; X7=F7,C7,one $ NAMELIST ; X8=F8,C8,one $ NAMELIST ; X9=F9,C9,one $ NAMELIST ; X10=F10,C10,one $ NAMELIST ; Y=I1,I2,I3,I4,I5,I6,I7,I8,I9,I10 $ SAMPLE ; 1  20 $ SURE ; Lhs = Y ; Eq1=X1;Eq2=X2;Eq3=X3;Eq4=X4;Eq5=X6;Eq6=X6 ; Eq7=X7;Eq8=X8;Eq9=X9;Eq10=X10 ; Maxit=0 ; OLS $ Criterion function for GLS is loglikelihood. Iteration 0, GLS = 737.6463 Iteration 1, GLS = 730.1070 ++  Estimates for equation: I1  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ F1  .12472490 .01490044 8.371 .0000 4333.84500 C1  .37951869 .02912686 13.030 .0000 648.435000 Constant 178.611571 65.7890483 2.715 .0066 ++  Estimates for equation: I2  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ F2  .16828512 .04057787 4.147 .0000 1971.82500 C2  .33587688 .10299836 3.261 .0011 294.855000 Constant 20.3887867 83.2537952 .245 .8065
76
++  Estimates for equation: I3  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ F3  .03425481 .00925706 3.700 .0002 1941.32500 C3  .12538119 .02040101 6.146 .0000 400.160000 Constant 14.3822597 20.6146424 .698 .4854 ++  Estimates for equation: I4  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ F4  .06760969 .01597735 4.232 .0000 693.210000 C4  .30752805 .02536245 12.125 .0000 121.245000 Constant 1.96954637 11.0026359 .179 .8579 ++  Estimates for equation: I5  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ F6  .00635232 .02903793 .219 .8268 419.865000 C6  .12737505 .09456013 1.347 .1780 104.285000 Constant 45.8520779 4.86959707 9.416 .0000 ++  Estimates for equation: I6  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ F6  .12891587 .01798607 7.168 .0000 419.865000 C6  .06768693 .06029084 1.123 .2616 104.285000 Constant 5.77499083 3.44886478 1.674 .0940 ++  Estimates for equation: I7  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ F7  .09106397 .04535783 2.008 .0447 149.790000 C7  .12913287 .01446995 8.924 .0000 314.945000 Constant 6.71472214 8.72476796 .770 .4415 ++  Estimates for equation: I8  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ F8  .05179274 .00835658 6.198 .0000 670.910000 C8  .04729955 .03473521 1.362 .1733 85.6400000 Constant 4.09249729 5.09237714 .804 .4216 ++  Estimates for equation: I9  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ F9  .07275469 .02111017 3.446 .0006 333.650000 C9  .06640816 .02194422 3.026 .0025 297.900000 Constant 2.16859331 7.30885683 .297 .7667 ++  Estimates for equation: I10  ++
77
+++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ F10  .01695668 .01550963 1.093 .2743 70.9210000 C10  .37466423 .05739586 6.528 .0000 5.94150000 Constant 2.06101718 1.16003699 1.777 .0756? c. Aggregation test according to (1015) MATRIX ; Z=Init(3,3,0) ; J=Iden(3); L=1*J $ MATRIX ; R=[j,z,z,z,z,z,z,z,z,l / z,j,z,z,z,z,z,z,z,l / z,z,j,z,z,z,z,z,z,l / z,z,z,j,z,z,z,z,z,l / z,z,z,z,j,z,z,z,z,l / z,z,z,z,z,j,z,z,z,l / z,z,z,z,z,z,j,z,z,l / z,z,z,z,z,z,z,j,z,l / z,z,z,z,z,z,z,z,j,l ] ; d = R*b ; Vd = R*Varb*R' ; list ; AggF = 1/27 * d'd $ Matrix AGGF has 1 rows and 1 columns. 1 +1 98.53777 CALC ; List ; Ftb(.95,27,(20010*3)) $ ++  Listed Calculator Results  ++ Result = 1.551534 ? d. Using separate OLS regressions, compute LM statistic ? OLS residuals were saved in matrix EOLS earlier. MATRIX ; VEOLS = 1/20*EOLS'EOLS ; VI = Diag(VEOLS) ; SDI = ISQR(VI) ; ROLS = SDI*VEOLS*SDI ; RR = ROLS' *ROLS $ CALC ; List ; LMStat = (20/2)*(Trc(RR)10) ; Ctb(.95, (9*10/2))$ ++  Listed Calculator Results  ++ LMSTAT = 97.617948 Result = 61.656233 ? Constrained Sur model with one coefficient vector. ? This is the unconstrained model in (1019)(1021) SAMPLE ; 1  200 $ REGRESS; Lhs = I ; Rhs = F,C,one $ ++  Ordinary least squares regression   LHS=I Mean = 145.9582   Standard deviation = 216.8753   WTS=none Number of observs. = 200   Model size Parameters = 3   Degrees of freedom = 197   Residuals Sum of squares = 1755850.   Standard error of e = 94.40840   Fit Rsquared = .8124080   Adjusted Rsquared = .8105035   Model test F[ 2, 197] (prob) = 426.58 (.0000)  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ F  .11556216 .00583571 19.803 .0000 1081.68110 C  .23067849 .02547580 9.055 .0000 276.017150 Constant 42.7143694 9.51167603 4.491 .0000 TSCS ; Lhs = I ; Rhs = F,C,one ; Pds=20 ; Model=S2,R0 $
78
++  Groupwise Regression Models   Estimator = 2 Step GLS   Groupwise Het. and Correlated (S2)   Nonautocorrelated disturbances (R0)   Test statistics against the correlation   Deg.Fr. = 45 C*(.95) = 61.66 C*(.99) = 69.96   Test statistics against the correlation   Likelihood ratio statistic = 320.2052   Loglikelihood function = 853.084972  ++ ++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] ++++++ F  .10806238 .00241169 44.808 .0000 C  .15079551 .00386063 39.060 .0000 Constant 20.1588844 .79950153 25.214 .0000 CREATE ; WI = (SDI(firm,firm))^2 $ REGRESS; Lhs = I ; Rhs = F,C,one ; Wts = WI $ ++  Ordinary least squares regression   LHS=I Mean = 6.993136   Standard deviation = 18.01824   WTS=WI Number of observs. = 200   Model size Parameters = 3   Degrees of freedom = 197   Residuals Sum of squares = 11690.82   Standard error of e = 7.703521   Fit Rsquared = .8190465   Adjusted Rsquared = .8172094  ++ +++++++ Variable Coefficient  Standard Error tratio P[T>t] Mean of X +++++++ F  .07847124 .00459121 17.092 .0000 96.8424912 C  .09896094 .00761314 12.999 .0000 23.8374846 Constant 2.96519441 .66964256 4.428 .0000
79
Chapter 11 Nonlinear Regression Models Exercises 1. We cannot simply take logs of both sides of the equation as the disturbance is additive rather than multiplicative. So, we must treat the model as a nonlinear regression. The linearized equation is y ≈ α 0 x β + x β (α − α 0 ) + α 0 (log x ) x β (β − β 0 ) 0
0
0
where α0 and β0 are the expansion point. For given values of α0 and β0, the estimating equation would be
( ) + β(α (log x) x ) + ε = α( x ) + β(α (log x ) x ) + ε .
y − α 0 x β + α 0 x β + α 0 (log x ) x β = α x β 0
or
y + α 0 (log x ) x β
0
0
0
β0
0
0
β0
0
β0
*
*
Estimates of α and β are obtained by applying ordinary least squares to this equation. The process is repeated with the new estimates in the role of α0 and β0. The iteration could be continued until convergence. Starting values are always a problem. If one has no particular values in mind, one candidate would be α0 = y and β0 = 0 or β0 = 1 and α0 either x′y/x′x or y / x . Alternatively, one could search directly for the α and β to minimize the sum of squares, S(α,β) = Σi (yi  αxβ)2 = Σi εi2. The first order conditions for minimization are ∂S(α,β)/∂α = 2Σi (yi  αxβ)xβ = 0 and ∂S(α,β)/∂β = 2Σi (yi  αxβ)α(lnx)xβ = 0. Methods for solving nonlinear equations such as these are discussed in Appendix E.. 2. The proof can be done by mathematical induction. For convenience, denote the ith derivative by fi. The first derivative appears in Equation (1034). Just by plugging in i=1, it is clear that f1 satisfies the relationship. Now, use the chain rule to differentiate f1, f2 = (1/λ2)[xλ(lnx)  x(λ)] + (1/λ)[(lnx)xλ(lnx)  f1] Collect terms to yield f2 = (1/λ)f1 + (1/λ)[xλ(lnx)2  f1] = (1/λ)[xλ(lnx)2  2f1]. So, the relationship holds for i = 0, 1, and 2. We now assume that it holds for i = K1, and show that if so, it also holds for i = K. This will complete the proof. Thus, assume fK1 = (1/λ)[xλ(lnx)K1  (K1)fK2] Differentiate this to give fK = (1/λ)fK1 + (1/λ)[(lnx)xλ(lnx)K1  (K1)fK1]. Collect terms to give fK = (1/λ)[xλ(lnx)K  KfK1], which completes the proof for the general case. Now, we take the limiting value limλ→0 fi = limλ→0 [xλ(lnx)i  ifi1]/λ. Use L'Hospital's rule once again. limλ→0 fi = limλ→0 d{[xλ(lnx)i  ifi1]/dλ}/limλ→0 dλ/dλ. Then, limλ→0 fi = limλ→0 {[xλ(lnx)i+1  ifi]} Just collect terms, (i+1)limλ→0 fi = limλ→0 [xλ(lnx)i+1] or limλ→0 fi = limλ→0 [xλ(lnx)i+1]/(i+1) = (lnx)i+1/(i+1).
80
Applications 1. First, the two simple regressions produce Linear Loglinear Constant 114.338 1.17064 (173.4) (.3268) Labor 2.33814 .602999 (1.039) (.1260) Capital .471043 .37571 (.1124) (.08535) R2 .9598 .9435 Standard Error 469.86 .1884 In the regression of Y on 1, K, L, and the predicted values from the loglinear equation minus the predictions from the linear equation, the coefficient on α is 587.349 with an estimated standard error of 3135. Since this is not significantly different from zero, this evidence favors the linear model. In the regression of lnY on 1, lnK, lnL and the predictions from the linear model minus the exponent of the predictions from the loglinear model, the estimate of α is .000355 with a standard error of .000275. Therefore, this contradicts the preceding result and favors the loglinear model. An alternative approach is to fit the BoxCox model in the fashion of Exercise 4. The maximum likelihood estimate of λ is about .12, which is much closer to the loglinear model than the lonear one. The loglikelihoods are 192.5107 at the MLE, 192.6266 at λ=0 and 202.837 at λ = 1. Thus, the hypothesis that λ = 0 (the loglinear model) would not be rejected but the hypothesis that λ = 1 (the linear model) would be rejected using the BoxCox model as a framework. 2. The search for the minimum sum of squares produced the following results: e′e λ .500 .78477 .400 .67033 .300 .60587 .250 .59479 .245 .59451 .244 .59447 .243 .59444 .242 .59441 .241 .59439 .240 .59438 .239 .59437 .238 .59436 .237 .59437 .235 .59440 .225 .59492 .200 .59897 .100 .65598 0.000 .78143 .100 .97742 .200 1.24354
81
The sum of squared residuals is minimized at λ = .238. At this value, the regression results are as follows: Parameter Estimate OLS Std.Error Correct Std.Error α 2.06092 .07718 .09723 βk .178232 .04638 .04378 βl .737988 .06996 .12560 λ .238 .07710 Estimated Asymptotic Covariance Matrix α βk βl λ α .00945 βk .00262 .00192 βl .00511 .00199 .01578 λ .00500 .00037 .00825 .00594 The output elasticities for this function evaluated at the sample means are .238
∂lnY/∂lnK = βkKλ = (.178232).175905 λ
= .2695
.238
= .7740. ∂lnY/∂lnL = βlL = (.443954).737988 The estimates found for Zellner and Revankar's model were .254 and .882, respectively, so these are quite similar. For the simple loglinear model, the corresponding values are .2790 and .927. 3. The Wald test is based on the unrestricted model. The statistic is the square of the usual tratio, W = (.232 / .0771)2 = 9.0546. The critical value from the chisquared distribution is 3.84, so the hypothesis that λ = 0 can be rejected. The likelihood ratio statistic is based on both models. The sum of squared residuals for both unrestricted and restricted models is given above. The loglikelihood is lnL = (n/2)[1 + ln(2π) + ln(e′e/n)], so the likelihood ratio statistic is LR = n[ln(e′e/n)λ=0  ln(e′e/n) λ=.238] = nln[(e′eλ=0) / (e′eλ=.238) = 25ln(.78143/.54369) = 6.8406. Finally, to compute the Lagrange Multiplier statistic, we regress the residuals from the loglinear regression on a constant, lnK, lnL, and (1/2)(bkln2K + blln2L) where the coefficients are those from the loglinear model (.27898 and .92731). The R2 in this regression is .23001, so the Lagrange multiplier statistic is LM = nR2 = 25(.23001) = 5.7503. All three statistics suggest the same conclusion, the hypothesis should be rejected. 4. Instead of minimizing the sum of squared deviations, we now maximize the concentrated loglikelihood function, lnL = (n/2)ln(1+ln(2π)) + (λ  1)Σi lnYi  (n/2)ln(ε′ε/n). The search for the maximum of lnL produced the results on the next page The loglikelihood is maximized at λ = .124. At this value, the regression results are as follows: Parameter Estimate OLS Std.Error Correct Std.Error α 2.59465 .1283 .7151 βk .378094 .1070 .3228 βl 1.13653 .1117 .4121 λ .124 .2482 σ2 .036922 .0179 Estimated Asymptotic Covariance Matrix α βk βl λ σ2 α .5114 βk .2203 .1042 βl .2612 .0951 .1698 λ .1747 .0730 .0953 .0617 σ2 .0104 .0044 .0059 .0038 .00032
82
λ lnL .200 13.6284 .150 12.8568 .100 12.2423 .050 11.7764 0.000 11.4476 .050 11.2427 .100 11.1480 .110 11.1410 .120 11.1378 .121 11.1377 .122 11.1376 .123 11.1376 .124 11.1375 .125 11.1376 .130 11.1383 .140 11.1423 .200 11.2344 .300 11.6064 .400 12.8371
The output elasticities for this function evaluated at the sample means, K = .175905, L = .737988, Y = 2.870777, are ∂lnY/∂lnK = bk(K/Y)λ = .2674 ∂lnY/∂lnL = bl(L/Y)λ = .9017. These are quite similar to the estimates given above. The sum of the two output elasticities for the states given in the example in the text are given below for the model estimated with and without transforming the dependent variable. Note that the first of these makes the model look much more similar to the Cobb Douglas model for which this sum is constant. State Full BoxCox Model lnQ on left hand side Florida 1.2840 1.6598 Louisiana 1.2019 1.4239 California 1.1574 1.1176 Maryland 1.1657 1.0261 Ohio 1.1899 .9080 Michigan 1.1604 .8506 Once again, we are interested in testing the hypothesis that λ = 0. The Wald test statistic is W = (.123 / .2482)2 = .2455. We would now not reject the hypothesis that λ = 0. This is a surprising outcome. The likelihood ratio statistic is based on both models. The sum of squared residuals for the restricted model is given above. The sum of the logs of the outputs is 19.29336, so the restricted loglikelihood is lnL0 = (01)(19.29336)  (25/2)[1 + ln(2π) + ln(.781403/25)] = 11.44757. The likelihood ratio statistic is 2[ 11.13758  (11.44757)] = .61998. Once again, the statistic is small. Finally, to compute the Lagrange multiplier statistic, we now use the method described in Example 11.8. The result is LM = 1.5621. All of these suggest that the loglinear model is not a significant restriction on the BoxCox model. This rather peculiar outcome would appear to arise because of the rather substantial reduction in the loglikelihood function which occurs when the dependent variable is transformed along with the right hand side. This is not a contradiction because the model with only the right hand side transformed is not a parametric restriction on the model with both sides transformed. Some further evidence is given in the next exercise.
83
5.
> nlsq ; lhs = y ; labels = b1,b2 ; fcn=b1*(1  1/sqr(1+2*b2*x)) ; start = 500,.0001 ;output=2$ Begin NLSQ iterations. Linearized regression. Iteration= 1; Sum of squares= 11603.0164 ; Gradient= 11602.9326 Iteration= 2; Sum of squares= 19821.5463 ; Gradient= 19821.4534 Iteration= 3; Sum of squares= 331169.005 ; Gradient= 331144.576 Iteration= 4; Sum of squares= 356630.271 ; Gradient= 356504.582 Iteration= 5; Sum of squares= 14997.8506 ; Gradient= 14938.8590 Iteration= 6; Sum of squares= 449.855530 ; Gradient= 442.701921 Iteration= 7; Sum of squares= 102026.884 ; Gradient= 102026.775 Iteration= 8; Sum of squares= 12887.7536 ; Gradient= 12886.6539 Iteration= 9; Sum of squares= 14263101.5 ; Gradient= 14263101.0 Iteration= 10; Sum of squares= 10203.1920 ; Gradient= 10202.6789 Iteration= 11; Sum of squares= 144.393444 ; Gradient= 144.338425 Iteration= 12; Sum of squares= 258.186688 ; Gradient= 258.145522 Iteration= 13; Sum of squares= .154284512 ; Gradient= .113316151 Iteration= 14; Sum of squares= .409681292E01; Gradient= .129216769E05 Iteration= 15; Sum of squares= .409668370E01; Gradient= .439070450E13 Iteration= 16; Sum of squares= .409668370E01; Gradient= .211594637E18 Iteration= 17; Sum of squares= .409668370E01; Gradient= .107898463E24 Convergence achieved ++  Nonlinear least squares regression   LHS=Y Mean = 43.34071   Standard deviation = 22.80652   WTS=none Number of observs. = 14   Model size Parameters = 2   Degrees of freedom = 12   Residuals Sum of squares = .4096684E01   Standard error of e = .5409439E01   Fit Rsquared = .9999939   Not using OLS or no constant. Rsqd & F may be < 0.  ++ ++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] ++++++ B1  636.427250 4.31789336 147.393 .0000 B2  .00020814 .164134D05 126.809 .0000 > nlsq ; lhs = y ; labels = b1,b2 ; fcn=b1*(1  1/sqr(1+2*b2*x)) ; start = 600,.0002 ;output=2$ Begin NLSQ iterations. Linearized regression. Iteration= 1; Sum of squares= 262.456583 ; Gradient= 262.415454 Iteration= 2; Sum of squares= .155984704 ; Gradient= .115016579 Iteration= 3; Sum of squares= .409675977E01; Gradient= .760690867E06 Iteration= 4; Sum of squares= .409668370E01; Gradient= .379981726E13 Iteration= 5; Sum of squares= .409668370E01; Gradient= .186919870E18 Iteration= 6; Sum of squares= .409668370E01; Gradient= .150578559E23 Convergence achieved ++  Nonlinear least squares regression   LHS=Y Mean = 43.34071   Standard deviation = 22.80652   Residuals Sum of squares = .4096684E01   Standard error of e = .5409439E01   Fit Rsquared = .9999939   Adjusted Rsquared = .9999944  ++ ++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] ++++++ B1  636.427250 4.31789336 147.393 .0000 B2  .00020814 .164134D05 126.809 .0000
84
Chapter 12 Instrumental Variables Estimation Exercises 1. There is no need for a separate proof different from the usual for OLS. Formally, however, it follows from the results at (124) that −1 ⎛ X′X ⎞ ⎛ X′ε ⎞ b = β+⎜ ⎟ ⎟ ⎜ ⎝ n ⎠ ⎝ n ⎠ Then, −1 ⎛ X′X ⎞ ⎛ X′ε ⎞ −1 b − plim b = ⎜ ⎟ ⎜ ⎟ − Q XXγ ⎝ n ⎠ ⎝ n ⎠ and ⎡⎛ X′X ⎞ −1 ⎛ X′ε ⎞ ⎤ −1 n ( b − plim b ) = n ⎢⎜ ⎟ ⎜ ⎟ − Q XXγ ⎥ ⎣⎢⎝ n ⎠ ⎝ n ⎠ ⎦⎥ The large sample distribution of this statistic will be the same as the large sample of the statistic with X′X/n replaced with its probablity limit, which is QXX. Thus, ⎡⎛ X′ε ⎞ ⎤ 1 n ( b − plim b ) → Q −XX n ⎢⎜ ⎟ −γ ⎥ ⎣⎝ n ⎠ ⎦ To deduce the large sample behavior of this statistic, we can invoke the results from chapter 4. The only change here is the nonzero mean (probability limit) of the vector in brackets. [See (123).] Thus, the same proof applies. The consistency, asymptotic normality and asymptotic covariance matrix equal to Asy.Var[b] = σε2 (X′X)1
2. A logical solution to this one is simple. For y and x*, Cov2(y,x*)/[Var(y)Var(x*)] = β2(σ*2)2/[(β2σ*2+σε2)(σ*2)] Cov2(y,x) /[Var(y)Var(x)] = Cov[βx*+ε,x*+u] / [Var(y)Var(x)] = {Cov[y,x*] +Cov[y,u]}2 / [Var(y)Var(x)] . The second term is zero, since y=βx*+ε which is uncorrelated with u. Thus, Cov2(y,x) /[Var(y)Var(x)] = Cov[y,x*] / [Var(y)Var(x)]. The numerator is the same. The denominator is larger, since [Var(y)Var(x)] = Var[y](Var[x*] + Var[u]), so the squared correlation must be smaller. If both variables are measured with errors, then we are comparing Cov2(y*,x*)/{Var[y*]Var[x*]} to Cov2(y,x)/{Var[y]Var[x]}. The numerator is the covariance of (βx* + ε + v) with (x* + u), so the numerator of the fraction is still β2(σ*2)2. The denominator is still obviously larger, so the same result holds when both variables are measured with error.
3. We work off (1216), using repeatedly the result Σuu = (σuj)(σuj)′ where j has a 1 in the first position and 0 in the remaining K1. From (1216), plim b = β  [Q* + Σuu]1Σuuβ. The vector is Σuuβ equals [σu2β1,0,...,0]′. The inverse matrix is ⎡
[Q* + Σuu]1 = ⎢( Q *) − ⎢⎣
−1
1 1 + (σu j)′ ( Q *) (σu j) −1
( Q *)
−1
⎤ −1 (σu j)(σu j)′ ( Q *) ⎥ ⎥⎦
85
This can be simplified since the quadratic form in the denominator just picks off the 1,1 diagonal element. Thus,
⎡
[Q* + Σuu]1 = ⎢( Q *) − −1
⎣
1 −1 −1 ⎤ ( Q *) (σu j)(σu j)′ ( Q *) ⎥ 1 + σu2 q*11 ⎦
Then ⎡
[Q* + Σuu]1Σuuβ= ⎢( Q *) − −1
⎣
1 −1 −1 ⎤ Q *) (σu j)(σu j)′ ( Q *) ⎥ (σ u j)(σu j)′ β 2 *11 ( 1 + σu q ⎦
= ( Q *) (σu j)(σu j)′ β −1
= ( Q *) j σu2β1 −1
⎡
= ( Q *) j ⎢1 − −1
⎣
1 −1 −1 Q *) (σu j)(σu j)′ ( Q *) (σ u j)(σu j)′ β 2 *11 ( 1 + σu q
σu2 q*11 −1 Q *) j σu2 β1 2 *11 ( 1 + σu q
σu2 q*11 ⎤ 2 ⎥ σu β1 1 + σu2 q*11 ⎦
=
−1 ( Q *) j
⎡ ⎤ 2 1 σu β1 ⎢ 2 *11 ⎥ ⎣ 1 + σu q ⎦
=
−1 ( Q *) j
⎡ σu2 β1 ⎤ ⎢ 2 *11 ⎥ ⎣ 1 + σu q ⎦
Finally, ( Q *) j equals the first column of ( Q *) −1
−1
= [q*11, q*21,...,q*k1]. Therefore, the first element,
given by (1217a) is
⎡ σ2 q*11 ⎤ σu2β1 ⎤ 11 q* = β1 ⎢1 − u 2 *11 ⎥ 2 *11 ⎥ ⎣ 1 + σu q ⎦ ⎣ 1 + σu q ⎦ ⎡
plim b1 = β1  ⎢ For (1217b), ⎡
σu2β1 ⎤ k1 q* 2 *11 ⎥ ⎣ 1 + σu q ⎦
plim b2 = β2  ⎢
4. To obtain the result, note first: plim b = β + QXX1γ Asy.Var[b] = (σ2/n)QXX1 Asy.Var[b2sls] = (σ2/n)QZX1QZZQXZ1.
86
The mean squared error of the OLS estimator is the variance plus the squared bias, M(bβ) = (σ2/n)QXX1 + QXX1γγ′QXX1 the mean squared error of the 2SLS estimator equals its variance. For OLS to be more precise then 2SLS, we would have to have (σ2/n)QXX1 + QXX1γγ′QXX1 << (σ2/n)QZX1QZZQXZ1. For convenience, let δ = QXX1γ so M(bβ) = (σ2/n)QXX1 + δδ′. If the mean squared error matrix of the OLS estimator is smaller than that of the 2SLS estimator, then its inverse is larger. Use (A66) to do the inversion. The result would be [(σ2/n)QXX1 + δδ′]1 >> [(σ2/n)QZX1QZZQXZ1]1 Now, use A66 1 [(σ2/n)QXX1 + δδ′]1 = (n/σ2) QXX (n/σ2) QXXδδ′(n/σ2) QXX 1 + δ ′(n / σ2 )Q XX δ Reinsert δ = QXX1γ and the right hand side above reduces to 1 (n/σ2) QXX (n/σ2)2 γγ′ 1 + (n / σ2 ) γ ′Q 1XX γ Therefore, if the mean squared error matrix of OLS is smaller, then 1 (n/σ2) QXX (n/σ2)2 γγ′ >> (n/σ2)QXZQZZ1QZX 1 + (n / σ2 ) γ ′Q 1XX γ Collect the terms, and this implies 1 (n/σ2)[ QXX  QXZQZZ1QZX] >> (n/σ2)2 γγ′ 1 + (n / σ2 ) γ ′Q 1XX γ divide both sides by (n/σ2), QXX  QXZQZZ1QZX >>
(n / σ2 ) γγ′ 1 + (n / σ2 ) γ ′Q 1XX γ
and divide numerator and denominator of the fraction by n/σ2 1 QXX  QXZQZZ1QZX >> 2 γγ′ (σ / n) + γ ′Q 1XX γ which is the desired result. Is it possible? It is possible, since QXX  QXZQZZ1QZX = plim (1/n)[X′X  X′Z(Z′Z)1Z′X] = plim (1/n) X′MZX which is a positive definite matrix. SInce γ varies independently of Z and X, certainly there is some configuration of the data and parameters for which this is the case. The result is that it is, indeed, possible for OLS to be more precise, in the mean squared error sense, than 2SLS.
5. The matrices are X = [i,x] and Z = [i,z]. For the OLS estimators, we know from chapter 2 that a = y − bx and b = Cov[x,y]/var[x]. For the IV estimator, (Z′X)1Z′y, we obtain the result in detail. Given the forms, Σxi ⎤ ⎡ n nx ⎤ ⎡n ⎡ n1 x1 −nx ⎤ ⎡ ny ⎤ 1 =⎢ (Z′X) = ⎢ , (Z′X)−1 = , Z′y = ⎢ ⎥ ⎥ ⎥ ⎢ ⎥ n ⎦ nn1 ( x − x ) ⎣ −n1 ⎣ n1 Σ z =1 xi ⎦ ⎣ n1 n1 x1 ⎦ ⎣ n1 y1 ⎦ where subscript 1 indicates the mean of the observations for which z equals 1, and n1 is the number of observations. Multiplying the matrix times the vector and cancelling terms produces the solutions aIV = aIV =
x1 y − x y1 y −y and bIV = 1 x1 − x x1 − x
87
Application a. The statement of the problem is actually a bit optimistic. GIven the way it is stated, it would imply that the exogenous variables in the “demand” equation would be, in principle, (Ed, Union, Fem) which are also in the supply equation, plus the remainder, (Exp, Exp2, Occ, Ind, South, SMSA, Blk). The problem is that the model as stated would not be identified – the supply equation would, but the demand equation would not be. The way out would be to assume that at least one of (Ed, Union, Fem) does not appear in the demand equation. Since surely education would, that leaves one or both of Union and Fem. We will assume both of them are omitted. So, our equation is lnWageit =
α1 + α2Edit + α3Expit + α4Expit2 + α5Occit + α6Indit + α7Southit + α8SMSAit + α9Blkit + γ Wksit + uit.
NAMELIST ; X = one,Ed,Exp,Expsq,Occ,Ind,South,SMSA,Blk,Wks $ NAMELIST ; Z = one,Ed,Exp,expsq,Occ,Ind,south,SMSA,Blk,Union,Fem $ Regress ; Lhs = lwage ; Rhs = X $ 2SLS ; Lhs = lwage ; Rhs = X ; Inst = Z $ REGRESS ; Lhs = Wks ; Rhs = Z ; cls:b(10)=0,b(11)=0$ ++  Ordinary least squares regression   LHS=LWAGE Mean = 6.676346   Standard deviation = .4615122   WTS=none Number of observs. = 4165   Model size Parameters = 10   Degrees of freedom = 4155   Residuals Sum of squares = 581.2717   Standard error of e = .3740280   Fit Rsquared = .3446066   Adjusted Rsquared = .3431870   Model test F[ 9, 4155] (prob) = 242.74 (.0000)  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant 5.13171052 .07238152 70.898 .0000 ED  .06112766 .00277226 22.050 .0000 12.8453782 EXP  .04291665 .00229783 18.677 .0000 19.8537815 EXPSQ  .00070803 .506204D04 13.987 .0000 514.405042 OCC  .07814434 .01502100 5.202 .0000 .51116447 IND  .09066812 .01247863 7.266 .0000 .39543818 SOUTH  .07629062 .01318346 5.787 .0000 .29027611 SMSA  .13789225 .01278553 10.785 .0000 .65378151 BLK  .26269494 .02304380 11.400 .0000 .07226891 WKS  .00484184 .00113470 4.267 .0000 46.8115246 ++  Two stage least squares regression   LHS=LWAGE Mean = 6.676346   Standard deviation = .4615122   WTS=none Number of observs. = 4165   Model size Parameters = 10   Degrees of freedom = 4155   Residuals Sum of squares = 602.3138   Standard error of e = .3807377   Fit Rsquared = .3192467   Adjusted Rsquared = .3177722   Model test F[ 9, 4155] (prob) = 216.50 (.0000)  ++  Instrumental Variables: ONE ED EXP EXPSQ OCC IND SOUTH SMSA BLK UNION FEM +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant 4.46105888 .27680953 16.116 .0000 ED  .06167266 .00283031 21.790 .0000 12.8453782
88
EXP EXPSQ OCC IND SOUTH SMSA BLK WKS
       
.04207640 .00068241 .07605669 .08348143 .08242895 .13244624 .25212290 .01922950
.00236282 .525268D04 .01531301 .01302032 .01364036 .01319402 .02383132 .00583960
17.808 12.992 4.967 6.412 6.043 10.038 10.579 3.293
.0000 .0000 .0000 .0000 .0000 .0000 .0000 .0010
19.8537815 514.405042 .51116447 .39543818 .29027611 .65378151 .07226891 46.8115246
This is the test of relevance of the instrumental variables. In the regression of WKS on the full set of exogenous variables, we test the hypothesis that the coefficients on the instruments, UNION and FEM are jointly zero. The results show that the hypothesis is rejected. We conclude that the instruments are relevant. ++  Linearly restricted regression   Ordinary least squares regression   LHS=WKS Mean = 46.81152   Standard deviation = 5.129098   WTS=none Number of observs. = 4165   Model size Parameters = 9   Degrees of freedom = 4156   Residuals Sum of squares = 108653.5   Standard error of e = 5.113097   Fit Rsquared = .8138966E02   Adjusted Rsquared = .6229705E02   Model test F[ 8, 4156] (prob) = 4.26 (.0000)   Restrictns. F[ 2, 4154] (prob) = 84.57 (.0000)   Not using OLS or no constant. Rsqd & F may be < 0.   Note, with restrictions imposed, Rsqd may be < 0.  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant 46.6129896 .67547781 69.007 .0000 ED  .03787988 .03789322 1.000 .3175 12.8453782 EXP  .05840099 .03139904 1.860 .0629 19.8537815 EXPSQ  .00178055 .00069145 2.575 .0100 514.405042 OCC  .14509978 .20533021 .707 .4798 .51116447 IND  .49950389 .17041135 2.931 .0034 .39543818 SOUTH  .42663864 .18010107 2.369 .0178 .29027611 SMSA  .37851979 .17468415 2.167 .0302 .65378151 BLK  .73479892 .31481083 2.334 .0196 .07226891 UNION  .444089D15 .182255D08 .000 1.0000 .36398559 FEM  .000000 ......(Fixed Parameter).......
89
Chapter 13 ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Simultaneous Equations Models 1. (a) Since nothing is excluded from either equation and there are no other restrictions, neither equation passes the order condition for identification. (1) We use (1312) and the equations which follow it. For the first equation, [A3′,A5′] = β22, a scalar which has rank M1 = 1 unless β22 = 0. For the second, [A3′,A5′] = β31. Thus, both equations are identified. (2) This restriction does not restrict the first equation, so it remains unidentified. The second equation is now identified, as [A3′,A5′] = [β11,β21] has rank 1 if either of the two ceofficients are nonzero. (3) If γ1 equals 0, the model becomes partially recursive. The first equation becomes a regression which can be estimated by ordinary least squares. However, the second equation continues to fail the order condition. To see the problem, consider that even with the restriction, any linear combination of the two equations has the same variables as the original second eqation. (4) We know from above that if β32 = 0, the second equation is identifiable. If it is, then γ2 is identified. We may treat it as known. As such, γ1 is known. By regressing y1  γ1y2 on the xs, we would obtain estimates of the remaining parameters, so these restrictions identify the model. It is instructive to analyze this from the standpoint of false structures as done in the text. A false structure which incorporates ⎡ 1 − γ⎤ ⎢− λ 1 ⎥ ⎥ ⎡f ⎢ f12 ⎤ 11 the known restrictions would be ⎢β11 β12 ⎥ × ⎢ ⎥ . If the false structure is to obey the restrictions, ⎥ ⎣ f 21 f 22 ⎦ ⎢ ⎢β 21 β 22 ⎥ ⎢⎣β 31 0 ⎥⎦ then f11  γ f21 = 1, f22  γ f12 = 1, f21  γf11 = f12  γ f22, β31 f12 = 0. It follows then that f12 = 0 so f11 = 1. Then, f21 γf 11 = γ or f21 = (f11  1)γ so that f11  γ2(f11  1) = 1. This can only hold for all values of γ if f11 = 1 and, then, f21 = 0. Therefore, F = I which establishes identification. (5) If β31 = 0, the first equation is identified by the usual rank and order conditions. Consider, then, the offdiagonal element of Σ = Γ′ΩΓ. Ω is identified since it is the reduced form covariance matrix. The offdiagonal element is σ12 = ω11 + ω22  (γ1 + γ2)ω12 = 0. Since γ1 is zero, γ2 = ω12/(ω11 + ω22). With γ2 known, the remaining parameters are estimable by least squares regression of (y2  γ2y1) on the xs. Therefore, the restrictions identify the model. (6) Since this is only a single restriction, it will not likely identify the entire model. Consider again the false structure. The restrictions implied by the theory are f11  γ2f21 = 1, f22  γ1f12 = 1, β21f11 + β22f21 = β21f12 + β22f22. The three restrictions on four unknown elements of F do not serve to pin down any of them. This restriction does not even partially identify the model. (7) The last four restrictions remove x2 and x3 from the model. The remaining model is not identified by the usual rank and order conditions. From part (5), we see that the first restriction implies σ12 = ω11 + ω22  (γ1 + γ2)ω12 = 0. But, with neither γ1 nor γ2 specified, this does not identify either parameter. (8) The first equation is identified by the conventional rank and order conditions. The second equation fails the order condition. But, the restriction σ12 = 0 provides the necessary additional information needed to identify the model. For simplicity, write the model with the restrictions imposed as y1 = γ1y2 + ε1 and y2 = γ2y1 + βx + ε2. The reduced form is y1 = π1x + v1 and y2 = π2x + v2 where π1 = γ1β/Δ and π2 = β/Δ with Δ = (1  γ1γ2), and v1 = (ε1 + γ1ε2)/Δ and v2 = (ε2 + γ2ε1)/Δ. The reduced form variances and covariances are ω11 = (γ12σ22 + σ11)/Δ2, ω22 = (γ22σ11 + σ22)/Δ2, ω12 = (γ1σ22 + γ2σ11)/Δ2. All reduced form parameters are estimable directly by using least squares, so the reduced form is identified in all cases. Now, γ1 = π1/π2. σ11 is the residual variance in the euqation (y1  γ1y2) = ε1, so σ11 must be estimable (identified) if γ1 is. Now, with a bit of manipulation, we find that γ1ω12  ω11 = σ11/Δ. Therefore, with σ11 and
90
γ1 "known" (identified), the only remaining unknown is γ2, which is therefore identified. With γ1 and γ2 in hand, β may be deduced from π2. With γ2 and β in hand, σ22 is the residual variance in the equation (y2  βx γ2y1) = ε2, which is directly estimable, therefore, identified. 2. Following the ⎡ (1) (2) ⎢− 1 α 3 ⎢ ⎢ 0 −1 matrix ⎢ 0 ⎢0 ⎢ 0 −1 ⎢ 0 ⎢⎣ 0
method in (3) (4) 0 0 γ1 0 −1 0 1 0 0 1
Example 13.6, for identification of the investment equation, we require that the (5) (6) (7) (8) (9) ⎤ α3 0 0 0 0 ⎥⎥ 0 0 0 γ3 γ2 ⎥ ⎥ have rank 5. Columns (1), (4), (6), (7), and (8) each 0 1 0 0 0⎥ 0 0 −1 0 0⎥ ⎥ 0 0 0 0 0 ⎥⎦
have one element in a different row, so they are linearly independent. Therefore, the matrix has rank five. For ⎡ (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) ⎤ ⎢− 1 0 α 0 α3 0 0 0 α2 0 ⎥⎥ 1 ⎢ ⎢ 0 − 1 β1 0 0 0 0 0 β2 β3 ⎥ the third equation, the required matrix is ⎢ ⎥ . Columns 1 0 0 0 01 0 0 0 0 ⎥ ⎢1 ⎢0 0 −1 0 0 0 −1 0 0 0 ⎥ ⎢ ⎥ 1 0 −1 0 0 0 0 0 1 ⎥⎦ ⎢⎣ 0 (4), (6), (7), (9), and (10) are linearly independent. 3. We find [A3′,A5′]′ for each equation. (1) (2) (3) (4) ⎡ γ 32 1 γ 34 ⎤ ⎡ 1 γ 12 0 ⎤ ⎡ 1 γ 12 0 ⎤ ⎢β ⎥ ⎢γ 1 ⎥⎥ ⎢ 41 γ 42 ⎢ 12 β13 β14 ⎥ , [ 0 β ⎢ , β 31 β 32 β 33 ⎥⎥ 43 β 44 ] , ⎢ 0 β 43 β 4 ⎥ ⎢β 21 1 0⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎢⎣ 0 β52 0 ⎥⎦ 0 ⎦ ⎣β 32 0 ⎣ 0 β 52 00⎦ Identification requires that the rank of each matrix be M1 = 3. The second is obviously not identified. In (1), none of the three columns can be written as a linear combination of the other two, so it has rank 3. (Although the second and last columns have nonzero elements in the same positions, for the matrix to have short rank, we would require that the third column be a multiple of the second, since the first cannot appear in the linear combination which is to replicate the second column.) By the same logic, (3) and (4) are identified. 4. Obtain the reduced form for the model in Exercise 1 under each of the assumptions made in parts (a) and (b1), (b6), and (b9). (1). The model is y1 = γ1y2 + β11x1 + β21x2 + β31x3 + ε1 y2 = γ2y1 + β12x1 + β22x2 + β32x3 + ε2. ⎡ − β11 − β12 ⎤ − γ2⎤ ⎡ 1 ⎢ Therefore, Γ = ⎢ − β 22 ⎥⎥ and Σ is unrestricted. The reduced form is ⎥ and B = ⎢ 0 − γ 1 ⎣ 1 ⎦ ⎢⎣− β 31 0 ⎥⎦ ⎡β11 + γ 1β 21 1 ⎢ γ β Π= 1 22 1 − γ 1γ 2 ⎢ ⎢⎣ β 31
γ 2β11 + β12 ⎤ ⎥ and β 22 ⎥ γ 2β 31 ⎥⎦
91
⎡ σ11 + γ 12 σ 22 ⎢ ⎢ + 2γ 1σ12 1 ⎢ Ω = (Γ1)′Σ(Γ1) = (1 − γ 1γ 2 ) 2 ⎢ ⎢γ 2 σ11 + γ 1σ 22 ⎢+ ( γ 1 + γ 2 ) σ12 ⎣
γ 2 σ11 + γ 1σ 22 ⎤ ⎥ + ( γ 1 + γ 2 )σ12 ⎥ ⎥ ⎥ 2 γ 2 σ11 + σ 22 ⎥ ⎥ + 2 γ 1σ12 ⎦
(6) The model is y1 = β11x1 + β21x2 + β31x3 + ε1 y2 = γ2y1 + β12x1 + β22x2 + β32x3 + ε2 The first equation is already a reduced form. Substituting it into the second provides the second reduced form. ⎡β11 β12 + γ 2β11 ⎤ γ 2 σ11 ⎤ ⎡1 γ 2 ⎤ ⎡ σ11 1 1 The coefficient matrix is P= ⎢⎢β 21 β 22 + γ 2β 21 ⎥⎥ , Γ1 = ⎢ ⎥ so Ω = (Γ )′Σ(Γ ) = ⎢γ σ ⎥ 2 0 1 γ σ 2 11 + σ 22 ⎦ ⎣ ⎦ ⎣ 2 11 ⎢⎣β 31 β 32 + γ 2β 31 ⎥⎦ (9) The model is y1 = γ1y2 + ε1 y2 = γ2y1 + β12x1 + ε2
⎡ σ + γ 12 σ 22 Then, Π = BΓ1 = [β12γ1/(1γ1γ2) β12/(1γ1γ2)] and Ω = ⎢ 11 ⎢⎣γ 2 σ11 + γ 1σ 22 ⎡5 2 3 ⎤ 5. The relevant submatrices are X′X = ⎢⎢2 10 8 ⎥⎥ , X′y1 = ⎢⎣ 3 8 15⎥⎦ ⎡ 3 5⎤ ⎡4 2 3 ⎤ ⎡10 ⎢ ⎥ y1′y2 = 6, X′Z1 = ⎢ 6 2⎥ , X′Z2 = ⎢⎢ 3 10 8 ⎥⎥ Z1′Z1 = ⎢ ⎣3 ⎢⎣7 3⎥⎦ ⎢⎣5 8 15⎥⎦
γ 2 σ11 + γ 1σ 22 ⎤ ⎥. γ 22 σ11 + σ 22 ⎥⎦
⎡ 4⎤ ⎡3⎤ ⎢ ⎥ , X′y = ⎢ ⎥ , y ′y = 20, y ′y = 10, 2 2 2 ⎢ 3⎥ ⎢6⎥ 1 1 ⎢⎣5⎥⎦ ⎢⎣7⎥⎦ ⎡10 3 5 ⎤ 3⎤ , Z2′Z2 = ⎢⎢ 3 10 8 ⎥⎥ , 5⎥⎦ ⎢⎣ 5 8 15⎥⎦
⎡20⎤ ⎡ 6 6 7⎤ ⎡6⎤ ⎡10⎤ ⎢ ⎥ Z1′Z2 = ⎢ , Z ′ y = , Z ′ y = , Z ′ y = ⎥ 1 1 ⎢4⎥ 1 2 ⎢ 3 ⎥ 2 1 ⎢ 3 ⎥ , Z2′y2 = 4 2 3 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎢⎣ 5 ⎥⎦ The two OLS coefficient vectors are d1 = (X′X)1X′y1 = [.439024,.536585] ′ d2 = (X′X)1X′y2 = [.193016,.384127,.19746] ′. The two stage least squares estimators are
⎡6⎤ ⎢ ⎥. ⎢6⎥ ⎢⎣7⎥⎦
∧
δ 1 = [Z1′X(X′X)1X′Z1]1[Z1′X(X′X)1X′y1] = [.368816,.578711] ′. ∧
δ 2 = [Z2′X(X′X)1X′Z2]1[Z2′X(X′X)1X′y2] = [.484375,.367188,.109375] ′. ∧
∧
∧
∧
∧
σ11 = (y1′y1  2y1′Z δ 1 + δ 1 ′Z1′Z1 δ 1 ) / 25 = .610397, σ 22 = .268384.
The estimated asymptotic covariance matrices are ∧ ∧ ⎡.215858 .129035⎤ Est.Var[ δ 1 ] = σ11 [Z1′X(X′X)1X′Z1]1 = ⎢ ⎥ ⎣.129036 .1995 ⎦ ⎡ .132423 −.007699 −.040035⎤ ∧ Est.Var[Est.Var[ δ 2 ]] = ⎢⎢ −.007688 .047259 −.022538⎥⎥ . ⎢⎣ −.040035 −.022638 .043311 ⎥⎦
The three stage least squares estimate is
92
⎡ ∧11 ⎤ −1 ⎢ σ [Z1 ' X( X' X) X' y1 ] + ⎥ ⎢ ∧ ⎥ − 1 ⎢ 12 −1 ∧ ⎡ ∧11 ⎤ σ [ Z 1 ' X ( X' X) X ' y 2 ] ⎥ −1 −1 12 ⎥ ⎢ σ [ Z 1 ' X ( X ' X) X ' Z 1 ] σ [ Z 1 ' X ( X' X ) X ' Z 2 ] ⎥ ⎢ ⎥ ∧ ⎢ ∧ ⎥ ⎢ ⎥ ⎢⎣σ12 [Z 2 ' X( X' X) −1 X' Z 1 ] σ 22 [Z 2 ' X( X' X) −1 X' Z 2 ]⎥⎦ ⎢ ∧12 ⎢σ [Z 2 ' X( X' X) −1 X' y1 ] + ⎥ ⎢ ⎥ ⎢ ∧22 ⎥ −1 ⎢⎣σ [Z 2 ' X( X' X) X' Z 2 ] ⎥⎦
= [.368817,.578708,.4706,.306363,.168294]′ . The estimated standard errors are the square roots of the diagonal elements of the inverse matrix, [.4637,.4466,.3626,.1716,.1628], compared to the 2SLS values, [.4637,.4466,.3639,.2174,.2081]. To compute the limited information maximum likelihood estimator, we require the matrix of sums of squares and cross products of residuals of the regressions of y1 and y2 on x1 and on x1, x2, and x3. These are ⎡16.5 3.60⎤ ⎡16.2872 2.55312⎤ W0 = Y′Y  Y′x1(x1′x1)1x1′Y = ⎢ , W1 = Y′Y  Y′X(X′X)1X′Y = ⎢ ⎥ ⎥. . ⎣3.60 8.20⎦ ⎣2.55312 53617 ⎦ The two characteristic roots of (W1)1W0 are 1.53157 and 1.00837. We carry the smaller one into the kclass computation [see, for example, Theil (1971) or Judge, et al (1985)]; −1
∧ . (5.3617) 3⎤ ⎡6 − 100837 . ( 2.55312) ⎤ ⎡.367116⎤ ⎡10 − 100837 δ 1k = ⎢ ⎥ ⎢ ⎥ = ⎢ .57973 ⎥ 3 5⎦ ⎣ 4 ⎣ ⎦ ⎣ ⎦ Finally, the two estimates of the reduced form are ⎡.680851 .329787⎤ (OLS) P = ⎢⎢.010638 .37243 ⎥⎥ ⎢⎣.191489 .202128⎥⎦
0 ⎡−.578711 ⎤ ⎢ Π=⎢ 0 −.367188⎥⎥ ⎢⎣ 0 −.109375⎥⎦ ∧
and
(2SLS)
1 −.484375⎤ ⎡ ⎥ ⎢.368816 1 ⎦ ⎣
−1
⎡.704581 .341281⎤ ⎢ ⎥ = ⎢.104880 .447051⎥ . ⎢⎣.049113 .133164⎥⎦
y1 = γ1y2 + β11x1 + β21x2 + ε1 y2 = γ2y1 + β32x3 + β42x4 + ε2 show that there are two restrictions on the reduced form coefficients. Describe a procedure for estimating the model while incorporating the restrictions. ⎡β11 0 ⎤ ⎢β ⎥ 1 − γ ⎡ 2⎤ ⎢ 21 0 ⎥ = [ ε ε ]. The structure is [y1 y2] ⎢ + [ x x x x ] 1 2 3 4 1 1 ⎢ 0 β 32 ⎥ 1 ⎥⎦ ⎣− γ 1 ⎢ ⎥ ⎣ 0 β 42 ⎦ 6. For the model
or y′ Γ + x′B = ε′. The reduced form coefficient matrix is γ 2β11 ⎤ ⎡ β11 ⎡ π11 π 21 ⎤ ⎢ β ⎢ ⎥ ⎥ 1 ⎢ 21 γ 2β 21 ⎥ = ⎢ π 21 π 22 ⎥ The two restrictions are π12/π11 = π22/π21 and Π = BΓ1 = ⎢ π 31 π 32 ⎥ 1 − γ 1γ 2 ⎢ γ 1β 32 β 32 ⎥ ⎢ ⎢ ⎥ ⎥ ⎣ γ 1β 42 β 42 ⎦ ⎣ π 41 π 42 ⎦ π31/π32 = π41/π42. If we write the reduced form as y1 = π11x1 + π21x2 + π31x3 + π41x4 + v1 y2 = π12x1 + π22x2 + π32x3 + π42x4 + v2. We could treat the system as a nonlinear seemingly unrelated regressions model. One possible way to handle the restrictions is to eliminate two parameters directly by making the substitutions π12 = π11π22/π21 and π31 = π32π41/π42.
93
The pair of equations would be y1 = π11x1 + π21x2 + (π32π41/π42)x3 + π41x4 + v1 y2 = (π11π22/π21)x1 + π22x2 + π32x3 + π42x4 + v2. This nonlinear system could now be estimated by nonlinear GLS. The function to be minimized would be Σ in=1 vi12σ11 + vi22σ22 + 2vi1vi2σ12 = ntr(Σ1W). Needless to say, this would be quite involved. 7. We would require that all three characteristic roots have modulus less than one. An intuitive guess that the diagonal element greater than one would preclude this would be correct. The roots are the solutions to −.9471 −.8991 ⎤ ⎡ −.1899 − λ ⎢ ⎥ = 0. Expanding this produces (.1899 + λ)(1.0287  λ)(.0952  λ) det ⎢ −λ . 0 10287 0 ⎥ −.0791 .0952 − λ ⎦⎥ ⎣⎢ −.0656  .0565(1.0287  λ).8991 = 0. There is no need to go any further. It is obvious that λ = 1.0287 is a solution, so there is at least one characteristic root larger than 1. The system is unstable. 8. Prove plim Yj′ε/T = ωj  Ωjjγj. Consistent with the partitioning y′ = [yj Yj′ Yi*′], partition Ω into ωjj ωj′ ω*j′ Ω = ωj Ωjj Ωj′ ω*j Ω*j Ωj* ⎡ 1 ⎤ and, as in the equation preceding (138), partition the jth column of Γ as Γj = ⎢⎢ − γ ⎥⎥ . Since the full set of ⎢⎣ 0 ⎥⎦ reduced form disturbances is V = EΓ1, it follows that E = VΓ. In particular, the jth column of E is εj = VΓj. In the reduced form, now referring to (158), Yj = XΠj + Vj, where Πj is the Mj columns of Π corresponding to the included endogenous variables and Vj is the T×Mj matrix of their reduced form disturbances. Since X is uncorrelated with all columns of E, we have ⎡ 1 ⎤ plim Yj′εj/T = plim Vj′ Γj /T = [ωj Ωjj Ωj* ] ⎢⎢ − γ ⎥⎥ = ωj  Ωjjγj as required. ⎢⎣ 0 ⎥⎦ 9. Prove that an underidentified equation cannot be estimated by two stage least squares. If the equation fails the order condition, then the number of excluded exogenous variables is less than the number of included endogenous. The matrix of instrumental variables to be used for two stage least ∧
squares is of the form Z = [XA,Xj], where XA is Mj linear combination of all K columns in X and Xj is Kj ∧
columns of X. In total, K = Kj* + Kj. If the equation fails the order condition, then Kj* < Mj, so Z is Mj + Kj ∧
columns which are linear combinations of K = Kj* + Kj < Mj + Kj. Therefore, Z cannot have full column ∧
∧
rank. In order to compute the two stage least squares estimator, we require ( Z ′ Z )1, which cannot be computed.
94
Application ?========================================================= ? Application 13.1  Simultaneous Equations ?========================================================= ? Read the data ? For convenience, rename the variables so they correspond ? to the example in the text. sample ; 1  204 $ create ; ct=realcons$ create ; it=realinvs$ create ; gt=realgovt$ create ; rt=tbilrate $ ? Impose (artifically) the adding up condition on total demand. create ; yt=ct+it+gt $ create ; ct1=ct[1] $ create ; yt1 = yt[1] $ create ; dyt = yt  yt1 $ sample ; 2204 $ names ; xt = one,gt,rt,ct1,yt1$ ? Estimate equations by 2sls and save coefficients with ? the names used in the example. 2sls ; lhs = ct ; rhs=one,yt,ct1 ; inst = xt $ ++  Two stage least squares regression   LHS=CT Mean = 3008.995   Standard deviation = 1456.900   WTS=none Number of observs. = 203   Model size Parameters = 3   Degrees of freedom = 200   Residuals Sum of squares = 75713.32   Standard error of e = 19.45679   Fit Rsquared = .9998208   Adjusted Rsquared = .9998190   Model test F[ 2, 200] (prob) =******* (.0000)  ++  Instrumental Variables: ONE GT RT CT1 YT1 +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant 13.8657181 5.31536302 2.609 .0091 YT  .05843862 .01790473 3.264 .0011 4663.67389 CT1  .92200662 .02657199 34.698 .0000 2982.97438 calc ; a0=b(1) ; a1=b(2) ; a2=b(3) $ 2sls ; lhs = it ; rhs=one,rt,dyt ; inst = xt $ ++  Two stage least squares regression   LHS=IT Mean = 654.5296   Standard deviation = 391.3705   WTS=none Number of observs. = 203   Model size Parameters = 3   Degrees of freedom = 200   Residuals Sum of squares = .7744227E+08   Standard error of e = 622.2631   Fit Rsquared = 1.540485   Adjusted Rsquared = 1.565889  ++  Instrumental Variables: ONE GT RT CT1 YT1 +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant 300.699429 125.980850 2.387 .0170 RT  56.5192542 15.4643912 3.655 .0003 5.24965517 DYT  16.5359646 2.02509785 8.166 .0000 39.8236453 calc ; b0=b(1) ; b1=b(2) ; b2=b(3) $
95
? ? Create the coefficients of the reduced form. We only need the parts ? for the dynamics. These are in the second half of the example. calc ; a=1a1b2 $ ? ? Construct the matrix that governs the dynamics of the system. Note that ? the I equation is static. It is a function of y(t1) and c(t1) but not ? of I(t1). This is the DELTA(1) submatrix in (1342). The dominant ? root is the largest rood of DELTA(1). calc ; list ; C11=(1b2)/a ; C12=a1*b2/a ; C21=a2/a ; C22=b2/a $ matrix ; C = [c11,c12 / c21,c22] $ ++  Listed Calculator Results  ++ C11 = .996253 C12 = .061967 C21 = .059124 C22 = 1.060378 Matrix ; list ; roots = cxrt(c)$ Calc ; list ; domroot = sqr(roots(1,1)^2 + roots(1,2)^2)$ > Matrix ; list ; roots = cxrt(c)$ Matrix ROOTS
has 2 rows and 2 columns. 1 2 +1 1.02832 .05134 2 1.02832 .05134 > Calc ; list ; domroot = sqr(roots(1,1)^2 + roots(1,2)^2)$ ++  Listed Calculator Results  ++ DOMROOT = 1.029596 ? The largest root is larger than on in absolute value.
The system is unstable.
3sls ; lhs = ct,it ; eq1=one,yt,ct1 ; eq2=one,rt,dyt ; inst=xt ; maxit=0 $ ++  Estimates for equation: CT   InstVar/GLS least squares regression   LHS=CT Mean = 3008.995   Residuals Sum of squares = 73370.06   Standard error of e = 19.15334  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant 17.4780776 4.55837624 3.834 .0001 YT  .07312129 .01415744 5.165 .0000 4663.67389 CT1  .90026227 .02103720 42.794 .0000 2982.97438 ++  Estimates for equation: IT   InstVar/GLS least squares regression   LHS=IT Mean = 654.5296   Residuals Sum of squares = .9735005E+08   Standard error of e = 697.6749  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant 236.744328 122.661644 1.930 .0536 RT  30.5417941 12.9861014 2.352 .0187 5.24965517 DYT  18.3544221 1.93633720 9.479 .0000 39.8236453
96
Chapter 14 ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Estimation Frameworks in Econometrics Exercise 1. A fully parametric model/estimator provides consistent, efficient, and comparatively precise results. The semiparametric model/estimator, by comparison, is relatively less precise in general terms. But, the payoff to this imprecision is that the semiparametric formulation is more likely to be robust to failures of the assumptions of the parametric model. Consider, for example, the binary probit model of Chapter 21, which makes a strong assumption of normality and homoscedasticity. If the assumptions are correct, the probit estimator is the most efficient use of the data. However, if the normality assumption or the homoscedasticity assumption are incorrect, then the probit estimator becomes inconsistent in an unknown fashion. Lewbel’s semiparametric estimator for the binary choice model, in contrast, is not very precise in comparison to the probit model. But, it will remain consistent if the normality assumption is violated, and it is even robust to certain kinds of heteroscedasticity.
Applications 1. Using the gasoline market data in Appendix Table F2.2, use the partially linear regression method in Section 16.3.3 to fit an equation of the form ln(G/Pop) = β1ln(Income) + β2lnPnew cars + β3lnPused cars + g(lnPgasoline) + ε crea;gp=lg;ip=ly;ncp=lpnc;upp=lpuc;pgp=lpg$ sort;lhs=pgp;rhs=gp,ip,ncp,upp$ crea;dgp=.809*gp  .5*gp[1]  .309*gp[2]$ crea;dip=.809*ip  .5*ip[1]  .309*ip[2]$ crea;dnc=.809*ncp .5*ncp[1].309*ncp[2]$ crea;duc=.809*upp .5*upp[1].309*upp[2]$ samp;336$ regr;lhs=dgp;rhs=dip,dnc,duc;res=e$ ++  Ordinary least squares regression Weighting variable = none   Dep. var. = DGP Mean= .9708646870E02, S.D.= .4738748109E01   Model size: Observations = 34, Parameters = 3, Deg.Fr.= 31   Residuals: Sum of squares= .1485994289E01, Std.Dev.= .02189   Fit: Rsquared= .799472, Adjusted Rsquared = .78653   Model test: F[ 2, 31] = 61.80, Prob value = .00000   Diagnostic: LogL = 83.2587, Restricted(b=0) LogL = 55.9431   LogAmemiyaPrCrt.= 7.559, Akaike Info. Crt.= 4.721   Model does not contain ONE. Rsquared and F can be negative!   Autocorrel: DurbinWatson Statistic = 1.34659, Rho = .32671  ++ +++++++ Variable  Coefficient  Standard Error tratio P[T>t]  Mean of X +++++++ DIP .9629902959 .11631885 8.279 .0000 .14504254E01 DNC .1010972781 .87755182E01 1.152 .2581 .20153536E01 DUC .3197058148E01 .51875022E01 .616 .5422 .35656776E01 > matr;varpl={1+1/(2*2)}*varb$ > matr;stat(b,varpl)$ ++
97
Number of observations in current sample = 34  Number of parameters computed here = 3  Number of degrees of freedom = 31  ++ ++++++ Variable  Coefficient  Standard Error b/St.Er.P[Z>z]  ++++++ B_1 .9629902959 .13004843 7.405 .0000 B_2 .1010972781 .98113277E01 1.030 .3028 B_3 .3197058148E01 .57998037E01 .551 .5815
2. ++  Nonparametric Regression for G   Observations = 36   Points plotted = 36   Bandwidth = .468092   Statistics for abscissa values  Mean = 2.316611   Standard Deviation = 1.251735   Minimum = .914000   Maximum = 4.109000     Kernel Function = Logistic   Cross val. M.S.E. = 121.084982   Results matrix = KERNEL  ++
Nonparametric Regression for
G
120 E[yxi] G
E[yxi]
110
100
90
80
70 .50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
PG
3. A. Using the probit model and the Klein and Spady semiparametric models, the two sets of coefficient estimates are somewhat similar. ++  Binomial Probit Model   Maximum Likelihood Estimates   Model estimated: Jul 31, 2002 at 05:16:40PM.  Dependent variable P   Weighting variable None   Number of observations 601   Iterations completed 5 
98
 Log likelihood function 307.2955   Restricted log likelihood 337.6885   Chi squared 60.78608   Degrees of freedom 5   Prob[ChiSqd > value] = .0000000   HosmerLemeshow chisquared = 5.74742   Pvalue= .67550 with deg.fr. = 8  ++ +++++++ Variable  Coefficient  Standard Error b/St.Er.P[Z>z]  Mean of X +++++++ Index function for probability Z2 .2202376072E01 .10177371E01 2.164 .0305 32.487521 Z3 .5990084920E01 .17086004E01 3.506 .0005 8.1776955 Z5 .1836462412 .51493239E01 3.566 .0004 3.1164725 Z7 .3751312008E01 .32844576E01 1.142 .2534 4.1946755 Z8 .2729824396 .52473295E01 5.202 .0000 3.9317804 Constant .9766647244 .36104809 2.705 .0068 ++  Seimparametric Binary Choice Model   Maximum Likelihood Estimates   Model estimated: Jul 31, 2002 at 11:01:24PM.  Dependent variable P   Weighting variable None   Number of observations 601   Iterations completed 13   Log likelihood function 334.7367   Restricted log likelihood 337.6885   Chi squared 5.903551   Degrees of freedom 4   Prob[ChiSqd > value] = .2064679   HosmerLemeshow chisquared = 118.69649   Pvalue= .00000 with deg.fr. = 8   Logistic kernel fn. Bandwidth = .34423  ++ +++++++ Variable  Coefficient  Standard Error b/St.Er.P[Z>z]  Mean of X +++++++ Characteristics in numerator of Prob[Y = 1] Z2 .3284308221E01 .52254249E01 .629 .5297 32.487521 Z3 .1089817386 .86483083E01 1.260 .2076 8.1776955 Z5 .2384951835 .23320058 1.023 .3064 3.1164725 Z7 .1026067037 .17130225 .599 .5492 4.1946755 Z8 .1892263132 .21598982 .876 .3810 3.9317804 Constant .0000000000 ........(Fixed Parameter)........
99
The probit model produces a set of marginal effects, as discussed in the text. These cannot be computed for the Klein and Spady estimator. ++  Partial derivatives of E[y] = F[*] with   respect to the vector of characteristics.   They are computed at the means of the Xs.   Observations used for means are All Obs.  ++ +++++++ Variable  Coefficient  Standard Error b/St.Er.P[Z>z]  Mean of X +++++++ Index function for probability Z2 .6695300413E02 .30909282E02 2.166 .0303 32.487521 Z3 .1821006800E01 .51704684E02 3.522 .0004 8.1776955 Z5 .5582910069E01 .15568275E01 3.586 .0003 3.1164725 Z7 .1140411992E01 .99845393E02 1.142 .2534 4.1946755 Z8 .8298761795E01 .15933104E01 5.209 .0000 3.9317804 Constant .2969094977 .11108860 2.673 .0075
These are the various fit measures for the probit model ++  Fit Measures for Binomial Choice Model   Probit model for variable P  ++  Proportions P0= .750416 P1= .249584   N = 601 N0= 451 N1= 150   LogL = 307.29545 LogL0 = 337.6885   Estrella = 1(L/L0)^(2L0/n) = .10056  ++  Efron  McFadden  Ben./Lerman   .10905  .09000  .66451   Cramer  Veall/Zim.  Rsqrd_ML   .10486  .17359  .09619  ++  Information Akaike I.C. Schwarz I.C.   Criteria 1.04258 652.98248  ++ Frequencies of actual & predicted outcomes Predicted outcome has maximum probability. Threshold value for predicting Y=1 = .5000 Predicted   + Actual 0 1  Total   + 0 437 14  451 1 130 20  150   + Total 567 34  601
These are the fit measures for the probabilities computed for the Klein and Spady model. The probit model fits better by all measures computed. ++  Fit Measures for Binomial Choice Model   Observed = P Fitted = KSPROBS  ++  Proportions P0= .750416 P1= .249584   N = 601 N0= 451 N1= 150   LogL = 320.37513 LogL0 = 337.6885   Estrella = 1(L/L0)^(2L0/n) = .05743  ++  Efron  McFadden  Ben./Lerman   .05686  .05127  .64117   Cramer  Veall/Zim.  Rsqrd_ML   .03897  .10295  .05599  ++
100
The first figure below plots the probit probabilities against the Klein and Spady probabilities. The models are obviously similar, though there is substantial difference in the fitted values.
.80 .70
PROBITS
.60 .50 .40 .30 .20 .10 .00 .100
.150
.200
.250
.300
.350
.400
.450
KSPROBS
Finally, these two figures plot the predicted probabilities from the two models against the respective index functions, b’x. Note that the two plots are based on different coefficient vectors, so it is not possible to merge the two figures.
101
Chapter 15 ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Minimum Distance Estimation and The Generalized Method of Moments Exercises 1.
The elements of J are
∂ b1 −5 / 2 = m3 ( −3 / 2) m2 ∂m2
∂b2 −3 = m4 ( −2) m2 ∂m2
∂ b1 −3/ 2 = m2 ∂m3
∂ b1 =0 ∂m4
∂b2 ∂b2 −2 =0 = m2 ∂m3 ∂m4
Using the formula given for the moments, we obtain, μ2 = σ2, μ3 = 0, μ4 = 3σ4. Insert these in the derivatives above to obtain ⎡
0
σ −3
⎢ −2 ⎣ −6σ
0
J=⎢
⎤ 0 ⎥ . ⎥ σ −4 ⎦
Since the rows of J are orthogonal, we know that the off diagonal term in JVJ′ will be zero, which simplifies things a bit. Taking the parts directly, we can see that the asymptotic variance of b1 will be σ6 Asy.Var[m3], which will be Asy.Var[ b1 ] = σ6(μ6  μ32 + 9μ23  3μ2μ4  3μ2μ4). The parts needed, using the general result given earlier, are μ6 = 15σ6, μ3 = 0, μ2 = σ2, μ4 = 3σ4. Inserting these in the parentheses and multiplying it out and collecting terms produces the upper left element of JVJ′ equal to 6, which is the desired result. The lower right element will be Asy.Var[b2] = 36σ4 Asy.Var[m2] + σ8Asy.Var[m4]  2(6)σ6Asy.Cov[m2,m4]. The needed parts are Asy.Var[m2] = 2σ4 Asy.Var[m4] = μ8  μ42 = 105σ8  (3σ4)2 Asy.Cov[m2,m4] = μ6  μ2μ4 = 15σ6  σ2(3σ4). Inserting these parts in the expansion, multiplying it out and collecting terms produces the lower right element equal to 24, as expected. 2. The necessary data are given in Examples 15.5. The two moments are m1′ =31.278 and m2′ . =1453.96. Based on the theoretical results m1′ = P/λ and m2′ = P(P+1)/λ2, the solutions are P = μ1′2/(μ2′  μ1′2) and λ = μ1′/(μ2′  μ1′2). Using the sample moments produces estimates P = 2.05682 and λ = 0.065759. The matrix of derivatives is ⎤ ⎡ ⎡ ∂μ '/ ∂P ∂μ1 '/ ∂λ ⎤ ⎡⎢ 1/ λ −P / λ2 −475.648 ⎤ ⎥ = ⎢ 15.207 = G=⎢ 1 ⎥. ⎥ ⎢ ⎥ ⎣∂μ 2 '/ ∂P ∂μ 2 '/ ∂λ ⎦ ⎣(2 P +1) / λ 2 −2 P ( P +1) / λ 3 ⎦ ⎣1,182.54 −44,220.08⎦ The covariance matrix for the moments is given in Example 18.7; ⎡ 24.7051 2307.126 ⎤ ⎥ ⎣ 2307.126 229,609.5⎦
Φ=⎢
102
3. a. The log likelihood for sampling from the normal distribution is logL = (1/2)[nlog2π + nlogσ2 + (1/σ2)Σi (xi  μ)2] write the summation in the last term as Σxi2 + nμ2  2μΣixi. Thus, it is clear that the log likelihood is of the form for an exponential family, and the sufficient statistics are the sum and sum of squares of the observations. b. The log of the density for the Weibull distribution is logf(x) = logα + logβ + (β1)logxi  αΣi xiβ. The log likelihood is found by summing these functions. The third term does not factor in the fashion needed to produce an exponential family. There are no sufficient statistics for this distribution. c. The log of the density for the mixture distribution is logf(x,y) = logθ  (β+θ)yi + xilogβ + xilogyi  log(x!) This is an exponential family; the sufficient statistics are Σiyi and Σixi.. 4. The question is (deliberately) misleading. We showed in Chapter 8 and in this chapter that in the classical regression model with heteroscedasticity, the OLS estimator is the GMM estimator. The asymptotic covariance matrix of the OLS estimator is given in Section 8.2. The estimator of the asymptotic covariance matrices are s2(X′X)1 for OLS and the White estimator for GMM. 5. The GMM estimator would be chosen to minimize the criterion q = n m′Wm where W is the weighting matrix and m is the empirical moment, m = (1/n)Σi (yi  Φ(xi′β))xi For the first pass, we’ll use W = I and just minimize the sumof squares. This provides an initial set of estimates that can be used to compute the optimal weighting matrix. With this first round estimate, we compute W = [(1/n2) Σi (yi  Φ(xi′β))2 xi xi′]1 then return to the optimization problem to find the optimal estimator. The asymptotic covariance matrix is computed from the first order conditions for the optimization. The matrix of derivatives is G = ∂m/∂β′ = (1/n)Σi φ(xi′β)xixi′ The estimator of the asymptotic covariance matrix will be V = (1/n)[G′WG]1 6. This is the comparison between (1512) and (1511). The proof can be done by comparing the inverses of the two covariance matrices. Thus, if the claim is correct, the matrix in (1511) is larger than that in (1512), or its inverse is smaller. We can ignore the (1/n) as well. We require, then, that
G ′Φ −1G > G ′WG[G ′WΦWG]1 G ′WG 7. Suppose in a sample of 500 observations from a normal distribution with mean μ and standard deviation σ, you are told that 35% of the observations are less than 2.1 and 55% of the observations are less than 3.6. Estimate μ and σ. If 35% of the observations are less than 2.1, we would infer that Φ[(2.1  μ)/σ] = .35, or (2.1  μ)/σ = .385 ⇒ 2.1  μ = .385σ. Likewise, Φ[(3.6  μ)/σ] = .55, or (3.6  μ)/σ = .126 ⇒ 3.6  μ = .126σ. ∧
∧
The joint solution is μ = 3.2301 and σ = 2.9354. It might not seem obvious, but we can also derive asymptotic standard errors for these estimates by constructing them as method of moments estimators. Observe, first, that the two estimates are based on moment estimators of the probabilities. Let xi denote one of the 500 observations drawn from the normal distribution. Then, the two proportions are obtained as follows: Let zi(2.1) = 1[xi < 2.1] and zi(3.6) = 1[xi < 3.6] be indicator functions. Then, the proportion of 35% has been obtained as z (2.1) and .55 is z (3.6). So, the two proportions are simply the means of functions of the sample observations. Each zi is a draw from a Bernoulli distribution with success probability π(2.1) = Φ((2.1μ)/σ) for zi(2.1) and π(3.6) = Φ((3.6μ)/σ) for zi(3.6). Therefore, E[ z (2.1)] = π(2.1), and E[ z (3.6)] = π(3.6). The
103
variances in each case are Var[ z (.)] = 1/n[π(.)(1π(.))]. The covariance of the two sample means is a bit trickier, but we can deduce it from the results of random sampling. Cov[ z (2.1), z (3.6)]] = 1/n Cov[zi(2.1),zi(3.6)], and, since in random sampling sample moments will converge to their population counterparts,
Cov[zi(2.1),zi(3.6)] = plim [{(1/n) ∑ i = 1 z i(2.1)zi(3.6)}  π(2.1)π(3.6)]. But, zi(2.1)zi(3.6) n
must equal [zi(2.1)]2 which, in turn, equals zi(2.1). It follows, then, that Cov[zi(2.1),zi(3.6)] = π(2.1)[1  π(3.6)]. Therefore, the asymptotic covariance matrix for the two sample 1 ⎡ π(2.1)(1 − π(2.1)) π(2.1)(1 − π(3.6)) ⎤ proportions is Asy.Var[ p(2.1), p(3.6)] = Σ = ⎢ ⎥ . If we insert our n ⎣π(2.1)(1 − π(3.6)) π(3.6)(1 − π(3.6)) ⎦ ⎡0.000455 0.000315⎤ sample estimates, we obtain Est . Asy.Var[ p( 2.1), p( 3.6)] = S = ⎢ ⎥. Now, ultimately, our ⎣0.000315 0.000495⎦ estimates of μ and σ are found as functions of p(2.1) and p(3.6), using the method of moments. The moment equations are ⎡1 n ⎤ ⎡ 2.1 − μ ⎤ m2.1 = ⎢ ∑i = 1 zi ( 2.1) ⎥  Φ ⎢ = 0, ⎣n ⎦ ⎣ σ ⎥⎦ ⎡1 n ⎤ ⎡ 3.6 − μ ⎤ m3.6 = ⎢ ∑i =1 zi ( 3.6) ⎥  Φ ⎢ = 0. n ⎦ ⎣ σ ⎥⎦ ⎣
⎡ ∂m2.1 / ∂μ ∂m2.1 / ∂σ ⎤ Now, let Γ = ⎢ ⎥ and let G be the sample estimate of Γ. Then, the estimator of the ⎣∂m3.6 / ∂μ ∂m3.61 / ∂σ ⎦ ∧
∧
asymptotic covariance matrix of ( μ , σ ) is [GS1G′]1. The remaining detail is the derivatives, which are just ∂m2.1/∂μ = (1/σ)φ((2.1μ)/σ) and ∂m2.1/∂σ = (2.1μ)/σ[∂m2.1/∂σ] and likewise for m3.6. Inserting our sample . ⎡0.37046 − 014259 ⎤ estimates produces G = ⎢ ⎥ . Finally, multiplying the matrices and computing the ⎣0.39579 0.04987 ⎦ . − 012492 . ⎡ 010178 ⎤ necessary inverses produces [GS1G′]1 = ⎢ ⎥ . The asymptotic distribution would be . 016973 . ⎣− 012492 ⎦ normal, as usual. Based on these results, a 95% confidence interval for μ would be 3.2301 ± 1.96(.10178)2 = 2.6048 to 3.8554.
104
Chapter 16 Maximum Likelihood Estimation Exercises 1. The density of the maximum is n[z/θ]n1(1/θ), 0 < z < θ. Therefore, the expected value is E[z] = likewise, E[z2] =
θ
∫0
θ
∫0
zndz = [θn+1/(n+1)][n/θn] = nθ/(n+1). The variance is found
z2n(z/n)n1(1/θ)dz = nθ2/(n+2) so Var[z] = E[z2]  (E[z])2 = nθ2/[(n + 1)2(n+2)].
Using mean squared convergence we see that lim E[z] = θ and lim Var[z] = 0, so that plim z = θ. n→∞
n→∞
2. The loglikelihood is lnL = nlnθ  (1/θ) ∑ i = 1 xi . The maximum likelihood estimator is obtained as the n
solution to ∂lnL/∂θ = n/θ + (1/θ2)
∑ i =1 xi n
∑
= 0, or θˆ ML = (1/n)
the MLE is {E[∂2lnL/∂θ2]}1 = {E[n/θ2  (2/θ3)
∑ i =1 xi ]}1. n
n
x = x . The asymptotic variance of
i =1 i
To find the expected value of this random
variable, we need E[xi] = θ. Therefore, the asymptotic variance is θ2/n. The asymptotic distribution is normal with mean θ and this variance. 3. The loglikelihood is lnL = nlnθ  (β+θ) ∑ i = 1 yi + lnβ ∑ i = 1 xi + n
n
∑
n
x ln yi 
i =1 i
∑
n i =1
ln( xi !)
∂lnL/∂θ = n/θ ∑ i = 1 yi n
The first and second derivatives are
=  ∑ i = 1 yi + n
∂lnL/∂β
∑i =1 xi /β n
∂2lnL/∂θ2 = n/θ2 ∂2lnL/∂β2 =  ∑ i = 1 xi /β2 n
∂2lnL/∂β∂θ = 0. Therefore, the maximum likelihood estimators are θˆ ML = 1/ y and βˆ = x / y and the asymptotic covariance ⎡n / θ 2 matrix is the inverse of E ⎢ ⎢⎣ 0
expected value of
∑ i =1 xi = n
∑
x i =1 i
nE[xi].
distribution of xi, which is f(x) =
∫
∞
0
⎤
0 n
2⎥
/β ⎥ ⎦
. In order to complete the derivation, we will require the
In order to obtain E[xi], it is necessary to obtain the marginal
θe − (β + θ) y (βy ) x / x ! dy = β x (θ / x !)
∫
∞
0
e − (β + θ) y y x dy. This is βx(θ/x!)
times a gamma integral. This is f(x) = βx(θ/x!)[Γ(x+1)]/(β+θ)x+1. But, Γ(x+1) = x!, so the expression reduces to f(x) = [θ/(β+θ)][β/(β+θ)]x. Thus, x has a geometric distribution with parameter π = θ/(β+θ). (This is the distribution of the number of tries until the first success of independent trials each with success probability 1π. Finally, we require the expected value of xi, which is E[x] = [θ/(β+θ)]
⎡n / θ 2 covariance matrix is ⎢ ⎢⎣ 0
⎤ 0 2⎥ n(β / θ) / β ⎥⎦
−1
∞
∑x=0
x[β/(β+θ)]x= β/θ. Then, the required asymptotic
⎡θ 2 / n 0 ⎤ =⎢ ⎥. βθ / n⎥⎦ ⎢⎣ 0
105
The maximum likelihood estimator of θ/(β+θ) is is
θn /(β + θ) = (1/ y )/[ x / y + 1/ y ] = 1/(1 + x ). Its asymptotic variance is obtained using the variance of a nonlinear function V = [β/(β+θ)]2(θ2/n) + [θ/(β+θ)]2(βθ/n) = βθ2/[n(β+θ)3]. The asymptotic variance could also be obtained as [1/(1 + E[x])2]2Asy.Var[ x ].) For part (c), we just note that γ = θ/(β+θ). For a sample of observations on x, the loglikelihood lnL = nlnγ + ln(1γ) ∑ i = 1 xi n
would be
∑ i =1 xi /(1γ). n
∂lnL/dγ = n/γ 
A solution is obtained by first noting that at the solution, (1γ)/γ = x = 1/γ  1. The solution for γ is, thus, γˆ = 1 / (1 + x ).Of course, this is what we found in part b., which makes sense. For part (d) f(yx) =
θe − (β + θ) y (βy ) x (β + θ) x (β + θ) f ( x, y) . Cancelling terms and gathering = x ! θ βx f ( x)
the remaining like terms leaves f(yx) = (β + θ)[(β + θ) y ] x e − (β + θ ) y / x ! so the density has the required form
{
}∫
with λ = (β+θ). The integral is [ λx +1 ] / x !
∞
0
e − λy y x dy . This integral is a Gamma integral which equals
Γ(x+1)/λx+1, which is the reciprocal of the leading scalar, so the product is 1. The loglikelihood function is lnL = nlnλ  λ ∑ i = 1 yi + lnλ ∑ i = 1 xi n
n
∂lnL/∂λ = ( ∑ i = 1 xi + n)/λ n
∑i =1 ln xi ! n
∑ i =1 yi . n
∂2lnL/∂λ2 = ( ∑ i = 1 xi + n)/λ2. n
Therefore, the maximum likelihood estimator of λ is (1 + x )/ y and the asymptotic variance, conditional on the xs is Asy.Var. ⎡λˆ ⎤ = (λ2/n)/(1 + x )
⎣ ⎦
Part (e.) We can obtain f(y) by summing over x in the joint density. First, we write the joint density as
f ( x , y ) = θe − θy e − βy (βy ) x / x ! . The sum is, therefore, f ( y ) = θe − θy
∑
∞ x =0
e −βy (βy ) x / x ! . The sum is
that of the probabilities for a Poisson distribution, so it equals 1. This produces the required result. The maximum likelihood estimator of θ and its asymptotic variance are derived from lnL = nlnθ  θ ∑ i = 1 yi n
∂lnL/∂θ = n/θ 
∑ i =1 yi n
∂2lnL/∂θ2 = n/θ2. Therefore, the maximum likelihood estimator is 1/ y and its asymptotic variance is θ2/n. Since we found f(y) by factoring f(x,y) into f(y)f(xy) (apparently, given our result), the answer follows immediately. Just divide the expression used in part e. by f(y). This is a Poisson distribution with parameter βy. The loglikelihood function and its first derivative are lnL = β ∑ i = 1 yi + ln ∑ i = 1 xi + n
n
∂lnL/∂β =  ∑ i = 1 yi + n
∑i =1 xi ln yi  ∑i =1 ln xi ! n
n
∑ i =1 xi /β, n
from which it follows that βˆ = x / y . 4. The loglikelihood and its two first derivatives are logL = nlogα + nlogβ + (β1) ∑ i = 1 log xi  α ∑i = 1 xiβ n
∂logL/∂α = n/α 
n
∑i =1 xiβ n
106
∂logL/∂β = n/β +
∑i =1 log xi  α ∑i =1(log xi ) xiβ n
n
Since the first likelihood equation implies that at the maximum, αˆ = n / ∑i =1 xiβ , one approach would be to n
scan over the range of β and compute the implied value of α. Two practical complications are the allowable range of β and the starting values to use for the search. The second derivatives are ∂2lnL/∂α2 = n/α2 ∂2lnL/∂β2 = n/β2  α ∑ i =1 (log xi ) 2 xiβ n
∂2lnL/∂α∂β =  ∑i =1 (log xi ) xiβ . n
If we had estimates in hand, the simplest way to estimate the expected values of the Hessian would be to evaluate the expressions above at the maximum likelihood estimates, then compute the negative inverse. First, since the expected value of ∂lnL/∂α is zero, it follows that E[xiβ] = 1/α. Now, E[∂lnL/∂β] = n/β + E[ ∑ i = 1 log xi ]  αE[ ∑i =1 (log xi ) xiβ ]= 0 n
n
as well. Divide by n, and use the fact that every term in a sum has the same expectation to obtain 1/β + E[lnxi]  E[(lnxi)xiβ]/E[xiβ] = 0. Now, multiply through by E[xiβ] to obtain E[xiβ] = E[(lnxi)xiβ]  E[lnxi]E[xiβ] or 1/(αβ) = Cov[lnxi,xiβ]. ~ 5. As suggested in the previous problem, we can concentrate the loglikelihood over α. From ∂logL/∂α = 0, we find that at the maximum, α = 1/[(1/n)
∑i =1 xiβ ]. n
Thus, we scan over different values of β to seek the
value which maximizes logL as given above, where we substitute this expression for each occurrence of α. Values of β and the loglikelihood for a range of values of β are listed and shown in the figure below. β logL 0.1 62.386 0.2 49.175 0.3 41.381 0.4 36.051 0.5 32.122 0.6 29.127 0.7 26.829 0.8 25.098 0.9 23.866 1.0 23.101 1.05 22.891 1.06 22.863 1.07 22.841 1.08 22.823 1.09 22.809 1.10 22.800 1.11 22.796 1.12 22.797 1.2 22.984 1.3 23.693 The maximum occurs at β = 1.11. The implied value of α is 1.179. The negative of the second derivatives ∧ ∧⎞ . 9.6506 ⎤ ⎡.04506 −.2673⎤ ⎛ ∧ ∧ ⎞ ⎡ 2555 1 ⎛ matrix at these values and its inverse are I⎜⎝ α , β⎟⎠ = ⎢ . ⎥ and I ⎜ α , β⎟ = ⎢ 9 . 6506 27 . 7552 ⎝ ⎠ ⎣−.2673 .04148⎥⎦ ⎣ ⎦ The Wald statistic for the hypothesis that β = 1 is W = (1.11  1)2/.041477 = .276. The critical value for a test of size .05 is 3.84, so we would not reject the hypothesis.
107
If β = 1, then αˆ = n / ∑i = 1 xi = 0.88496. The distribution specializes to the geometric distribution n
if β = 1, so the restricted loglikelihood would be logLr = nlogα  α ∑ i = 1 xi = n(logα  1) at the MLE. n
logLr at α = .88496 is 22.44435. The likelihood ratio statistic is 2logλ = 2(23.10068  22.44435) = 1.3126. Once again, this is a small value. To obtain the Lagrange multiplier statistic, we would compute −1
⎡ − ∂ 2 log L / ∂α 2 − ∂ 2 log L / ∂α∂β⎤ ⎡∂ log L / ∂α ⎤ [∂ log L / ∂α ∂ log L / ∂β] ⎢− ∂2 log L / ∂α∂β − ∂2 log L / ∂β2 ⎥ ⎢ ∂ log L / ∂β ⎥ ⎢⎣ ⎥⎦ ⎣ ⎦ at the restricted estimates of α = .88496 and β = 1. Making the substitutions from above, at these values, we would have ∂logL/∂α = 0 1 n n ∂logL/∂β = n + ∑ i = 1 log xi  ∑i = 1 xi log xi = 9.400342 x 2
∂2logL/∂α2 = − nx = 25.54955 1 n ∂2logL/∂β2 = n  ∑i =1 xi (log xi ) 2 = 30.79486 x ∂2logL/∂α∂β = − ∑i =1 xi log xi = 8.265. n
The lower right element in the inverse matrix is .041477. The LM statistic is, therefore, (9.40032)2.041477 = 2.9095. This is also well under the critical value for the chisquared distribution, so the hypothesis is not rejected on the basis of any of the three tests. 6. a. The full log likelihood is logL = Σ log fyx(y,xα,β). b. By factoring the density, we obtain the equivalent logL = Σ[ log fyx (yx,α,β) + log fx (xα)] c. We can solve the first order conditions in each case. From the marginal distribution for x, Σ ∂ log fx (xα)/∂α = 0 provides a solution for α. From the joint distribution, factored into the conditional plus the marginal, we have Σ[ ∂log fyx (yx,α,β)/∂α + ∂log fx (xα)/∂α = 0 = 0 Σ[ ∂log fyx (yx,α,β)/∂β d. The asymptotic variance obtained from the first estimator would be the negative inverse of the expected second derivative, Asy.Var[a] = {[E[Σ2∂ log fx (xα)/∂α2]}1. Denote this Aαα1. Now, consider the second estimator for α and β jointly. The negative of the expected Hessian is shown below. Note that the Aαα from the marginal distribution appears there, as the marginal distribution appears in the factored joint distribution. ⎡ Bαα Bαβ ⎤ ⎡ Aαα 0 ⎤ ⎡ Aαα + Bαα Bαβ ⎤ =⎢ + = B Bββ ⎥⎦ ⎢⎣ 0 0 ⎥⎦ ⎢⎣ Bβα Bββ ⎥⎦ ⎛ α ⎞ ⎛ α ⎞′ ⎣ βα ∂⎜ ⎟⎜ ⎟ ⎝ β ⎠⎝ β ⎠ The asymptotic covariance matrix for the joint estimator is the inverse of this matrix. To compare this to the asymptotic variance for the marginal estimator of α, we need the upper left element of this matrix. Using the formula for the partitioned inverse, we find that this upper left element in the inverse is −E
∂ 2 ln L
[(Aαα+Bαα)  (BαβBββ1Bβα)]1 = [Aαα + (Bαα  BαβBββ1Bβα)]1 which is smaller than Aαα as long as the second term is positive. e. (Unfortunately, this is an error in the text.) In the preceding expression, Bαβ is the cross derivative. Even if it is zero, the asymptotic variance from the joint estimator is still smaller, being [Aαα + Bαα]1. This makes sense. If α appears in the conditional distribution, then there is additional information in the factored joint likelhood that is not in the marginal distribution, and this produces the smaller asymptotic variance.
108
7. The log likelihood for the Poisson model is LogL = nλ + logλΣi yi  Σi log yi! The expected value of 1/n times this function with respect to the true distribution is E[(1/n)logL] = λ + logλ E0[ y ] – E0 (1/n)Σi logyi! The first expectation is λ0. The second expectation can be left implicit since it will not affect the solution for λ  it is a function of the true λ0. Maximizing this function with respect to λ produces the necessary condition ∂E0 (1/n)logL]/∂λ = 1 + λ0/λ = 0 which has solution λ = λ0 which was to be shown. 8. The log likelihood for a sample from the normal distribution is LogL = (n/2)log2π  (n/2)logσ2 – 1/(2σ2) Σi (yi  μ)2. E0 [(1/n)logL] = (1/2)log2π  (1/2)logσ2 – 1/(2σ2) E0[(1/n) Σi (yi  μ)2]. The expectation term equals E0[(yi  μ)2] = E0[(yi  μ0)2] + (μ0  μ)2 = σ02 + (μ0  μ)2 . Collecting terms, E0 [(1/n)logL] = (1/2)log2π  (1/2)logσ2 – 1/(2σ2)[ σ02 + (μ0  μ)2] To see where this is maximized, note first that the term (μ0  μ)2 enters negatively as a quadratic, so the maximizing value of μ is obviously μ0. Since this term is then zero, we can ignore it, and look for the σ2 that maximizes (1/2)log2π  (1/2)logσ2 – σ02/(2σ2). The –1/2 is irrelevant as is the leading constant, so we wish to minimize (after changing sign) logσ2 + σ02/σ2 with respect to σ2. Equating the first derivative to zero produces 1/σ2 = σ02/(σ2)2 or σ2 = σ02, which gives us the result. 9. The log likelihood for the classical normal regression model is LogL = Σi (1/2)[log2π + logσ2 + (1/σ2)(yi  xi′β)2] If we reparameterize this in terms of η = 1/σ and δ = β/σ, then after a bit of manipulation, LogL = Σi (1/2)[log2π  logη2 + (ηyi  xi′δ)2] The first order conditions for maximizing this with respect to η and δ are ∂logL/∂η = n/η  Σi yi (ηyi  xi′δ) = 0 ∂logL/∂δ =
Σi xi (ηyi  xi′δ) = 0
Solve the second equation for δ, which produces δ = η (X′X)1X′y = η b. Insert this implicit solution into the first equation to produce n/η = Σi yi (ηyi  ηxi′b). By taking η outside the summation and multiplying the entire expression by η, we obtain n = η2 Σi yi (yi  xi′b) or η2 = n/[Σi yi (yi  xi′b)]. This is an analytic solution for η that is only in terms of the data – b is a sample statistic. Inserting the square root of this result into the solution for δ produces the second result we need. By pursuing this a bit further, you canshow that the solution for η2 is just n/e′e from the original least squares regression, and the solution for δ is just b times this solution for η. The second derivatives matrix is
109
∂2logL/∂η2 = n/η2  Σiyi2 ∂2logL/∂δ ∂δ′ = Σi xixi′ ∂2logL/∂δ ∂η = Σi xiyi. We’ll obtain the expectations conditioned on X. E[yixi] is xi′β from the original model, which equals xi′δ/η. E[yi2xi] = 1/η2 (δ′xi)2 + 1/η2. (The cross term has expectation zero.) Summing over observations and collecting terms, we have, conditioned on X, E[∂2logL/∂η2X] = 2n/η2  (1/η2)δ′X′Xδ E[∂2logL/∂δ ∂δ′X] = X′X E[∂2logL/∂δ ∂ηX] = (1/η)X′Xδ The negative inverse of the matrix of expected second derivatives is X'X −(1/ η ) X ' Xδ ⎤ ⎡ Asy.Var[d, h] = ⎢ ⎥ 2 ⎣⎢ −(1/ η )δ ' X ' X (1/ η )[2n + δX ' Xδ ⎦⎥
−1
(The off diagonal term does not vanish here as it does in the original parameterization.) 10. The first derivatives of the log likelihood function are ∂logL/∂μ = (1/2σ2) Σi 2(yi  μ). Equating this to zero produces the vector of means for the estimator of μ. The first derivative with respect to σ2 is ∂logL/∂σ2 = nM/(2σ2) + 1/(2σ4)Σi (yi  μ)′(yi  μ). Each term in the sum is Σm (yim  μm)2. We already deduced that the estimators of μm are the sample means. Inserting these in the solution for σ2 and solving the likelihood equation produces the solution given in the problem. The second derivatives of the log likelihood are ∂2logL/∂μ∂μ′ = (1/σ2)Σ i I ∂2logL/∂μ∂σ2 = (1/2σ4) Σi 2(yi  μ) ∂2logL/∂σ2∂σ2 = nM/(2σ4)  1/σ6 Σi (yi  μ)′(yi  μ) The expected value of the first term is (n/σ2)I. The second term has expectation zero. Each term in the summation in the third term has expectation Mσ2, so the summation has expected value nMσ2. Adding gives the expectation for the third term of nM/(2σ4). Assembling these in a block diagonal matrix, then taking the negative inverse produces the result given earlier. For the Wald test, the restriction is H0: μ  μ0i = 0. The unrestricted estimator of μ is x . The variance of x is given above, so the Wald statistic is simply ( x  μ0i )′ Var[( x  μ0i )]1( x  μ0i ). Inserting the covariance matrix given above produces the suggested statistic.
110
11. The asymptotic variance of the MLE is, in fact, equal to the CramerRao Lower Bound for the variance of a consistent, asymptotically normally distributed estimator, so this completes the argument. In example 4.9, we proposed a regression with a gamma distributed disturbance, yi = α + xi′β + εi
where,
f(εi) = [λP/Γ(P)] εiP1 exp(λεi), εi > 0, λ > 0, P > 2.
(The fact that εi is nonnegative will shift the constant term, as shown in Example 4.9. The need for the restriction on P will emerge shortly.) It will be convenient to assume the regressors are measured in deviations from their means, so Σixi = 0. The OLS estimator of β remains unbiased and consistent in this model, with variance Var[bX] = σ2(X′X)1 where σ2 = Var[εiX] = P/λ2. [You can show this by using gamma integrals to verify that E[εiX] = P/λ and E[εi2X] = P(P+1)/λ2. See B39 and (E1) in Section E2.3. A useful device for obtaining the variance is Γ(P) = (P1)Γ(P1).] We will now show that in this model, there is a more efficient consistent estimator of β. (As we saw in Example 4.9, the constant term in this regression will be biased because E[εiX] = P/λ; a estimates α+P/λ. In what follows, we will focus on the slope estimators. The log likelihood function is Ln L =
∑
n i =1
P ln λ − ln Γ( P ) + ( P − 1) ln εi − λεi
The likelihood equations are ∂ lnL/∂α ∂ lnL/∂β ∂ lnL/∂λ ∂ lnL/∂P
= = = =
Σi [(P1)/εi + λ] = 0, Σi [(P1)/εi + λ]xi = 0, Σi [P/λ  εi] = 0, Σi [lnλ  ψ(P)  εi] = 0.
The function ψ(P) = dlnΓ(P)/dP is defined in Section E2.3.) To show that these expressions have expectation zero, we use the gamma integral once again to show that E[1/εi] = λ/(P1). We used the result E[lnεi] = ψ(P)λ in Example 15.5. So show that E[∂lnL/∂β] = 0, we only require E[1/εi] = λ/(P1) because xi and εi are independent. The second derivatives and their expectations are found as follows: Using the gamma integral once again, we find E[1/εi2] = λ2/[(P1)(P2)]. And, recall that Σixi = 0. Thus, conditioned on X, we have = E[Σi (P1)(1/εi2)] = nλ2/(P2), E[∂2lnL/∂α2] 2 2 E[∂ lnL/∂α∂β] = E[Σi (P1)(1/εi )xi] = 0, = n, E[∂2lnL/∂α∂λ] = E[Σi (1)] = nλ/(P1), E[∂2lnL/∂α∂P] = E[Σi (1/εi)] E[∂2lnL/∂β∂β′] = E[Σi (P1)(1/εi2)xixi′] = Σi [λ2/(P2)]xixi′ = [λ2/(P2)](X′X), = 0, E[∂2lnL/∂λ∂β] = E[Σi (1)xi] = 0, E[∂2lnL/∂P∂β] = E[Σi (1/εi)xi] = E[Σi (P/λ2)] = nP/λ2, E[∂2lnL/∂λ2] = n/λ, E[∂2lnL/∂λ∂P] = E[Σi (1/λ)] = E[Σi ψ′(P)] = nψ′(P). E[∂2lnL/∂P2] Since the expectations of the cross partials witth respect to β and the other parameters are all zero, it follows that the asymptotic covariance matrix for the MLE of β is simply Asy.Var[ βˆMLE ] = {E[∂2lnL/∂β∂β′]}1 = [(P2)/λ2](X′X)1. Recall, the asymptotic covariance matrix of the ordinary least squares estimator is
111
Asy.Var[b] = [P/λ2](X′X)1. (Note that the MLE is ill defined if P is less than 2.) Thus, the ratio of the variance of the MLE of any element of β to that of the corresponding element of b is (P2)/P which is the result claimed in Example 4.9. Applications
1. a. For both probabilities, the symmetry implies that 1 – F(t) = F(t). In either model, then, Prob(y=1) = F(t) and Prob(y=0) = 1 – F(t) = F(t). These are combined in Prob(Y=y) = F[(2yi1)ti] where ti = xi′β. Therefore, ln L = Σi ln F[(2yi1)xi′β] ∂lnL/∂β =
b.
∑
f [(2 yi − 1)x′iβ] (2 yi − 1)xi = 0 F [(2 yi − 1)x′iβ]
n i =1
where f[(2yi1)xi′β] is the density function. For the logit model, f = F(1F). So, for the logit model, ∂lnL/∂β =
∑
n i =1
{1 − F [(2 yi − 1)x′i β ]}(2 yi − 1)xi = 0
Evaluating this expression for yi = 0, we get simply –F(xi′β)xi. When yi = 1, the term is [1 F(xi′β)]xi. It follows that both cases are [yi  F(xi′β)]xi, so the likelihood equations for the logit model are ∂lnL/∂β =
∑
n i =1
[ yi − Λ (x′i β )]xi = 0.
For the probit model, F[(2yi1)xi′β] = Φ[(2yi1)xi′β] and f[(2yi1)xi′β] = φ[(2yi1)xi′β], which does not simplify further, save for that the term 2yi inside may be dropped since φ(t) = φ(t). Therefore, ∂lnL/∂β =
∑
n i =1
φ[(2 yi − 1)x′i β] (2 yi − 1)xi = 0 Φ[(2 yi − 1)x′i β]
c. For the logit model, the result is very simple. ∂2lnL/∂β∂β′=
∑
n i=1
− Λ (x′i β )[1 − Λ (β )]xi x′i .
For the probit model, the result is more complicated. We will use the result that dφ(t)/dt = tφ(t). It follows, then, that d[φ(t)/Φ(t)]/dt = [φ(t)/Φ(t)][t + φ(t)/Φ(t)]. Using this result directly, it follows that ∂2lnL/∂β∂β′=
∑
n i =1
⎛ φ[(2 yi − 1)x′iβ] ⎞⎛ φ[(2 yi − 1)x′i β] ⎞ 2 −⎜ ⎟⎜ (2 yi − 1)x′iβ + ⎟ (2 yi − 1) xi x′i = 0 ′ ′ [(2 y 1) x β ] [(2 y 1) x β ] Φ − Φ − i i i i ⎝ ⎠⎝ ⎠
This actually simplifies somewhat because (2yi1)2 = 1 for both values of yi and φ[(2 yi − 1) x′i β ] = φ( x′i β )
112
d. Denote by H the actual second derivatives matrix derived in the previous part. Then, Newton’s method is
{
}
βˆ ( j + 1) = βˆ ( j ) − H ⎣⎡βˆ ( j ) ⎦⎤
−1
⎡ ∂ ln L[βˆ ( j )] ⎤ ⎢ ⎥ ˆ ⎣ ∂β ( j ) ⎦
where the terms on the right hand side indicate first and second derivatives evaluated at the “previous” estimate of β. e. The method of scoring uses the expected Hessian instead of the actual Hessian in the iterations. The methods are the same for the logit model, since the Hessian does not involve yi. The methods are different for the probit model, since the expected Hessian does not equal the actual one. For the logit model [E(H)]1 =
{∑
n i =1
Λ(x′iβ)[1 − Λ(β)]xi x′i
}
−1
For the probit model, we need first to obtain the expected value. Do obtain this, we take the expected value, with Prob(y=0) = 1  Φ and Prob(y=1) = Φ. The expected value of the ith term in the negative hessian is the expected value of the term,
⎛ φ[(2 yi − 1)x′iβ] ⎞⎛ φ[(2 yi − 1)x′iβ] ⎞ ⎜ ⎟⎜ (2 yi − 1)x′i β + ⎟ xi x′i Φ[(2 yi − 1)x′iβ] ⎠ ⎝ Φ[(2 yi − 1)x′iβ] ⎠⎝ This is
⎛ φ[x′iβ] ⎞⎛ ⎛ φ[x′i β] ⎞⎛ φ[x′i β] ⎞ φ[x′iβ] ⎞ Φ[−x′i β] ⎜ ⎟⎜ −x′i β + ⎟ xi x′i + Φ[x′iβ] ⎜ ⎟⎜ x′iβ + ⎟ xi x′i Φ[−x′iβ] ⎠ Φ[x′iβ] ⎠ ⎝ Φ[−x′iβ] ⎠⎝ ⎝ Φ[x′i β] ⎠⎝ ⎛ φ[x′iβ] φ[x′i β] ⎞ = φ[x′i β] ⎜ −x′iβ + + x′i β + ⎟ xi x′i Φ[−x′i β] Φ[x′i β] ⎠ ⎝ ⎛ φ[x′i β] φ[x′iβ] ⎞ = φ[x′iβ] ⎜ + ⎟ xi x′i ⎝ Φ[− x′i β ] Φ[x′iβ] ⎠ 1 1 ⎞ 2⎛ = ( φ[x′iβ ]) ⎜ + ⎟ x i x′ ⎝ Φ[−x′iβ ] Φ[x′iβ ] ⎠ 2 ⎛ Φ[ x′i β ] + Φ[ − x′i β ] ⎞ = ( φ[x′iβ ]) ⎜ ⎟ x i x′ ⎝ Φ[−x′i β ]Φ[x′i β ] ⎠ 2 ⎛ ⎞ ( φ[x′iβ]) ⎟ x x′ =⎜ ⎜ [1 − Φ (x′i β)]Φ (x′iβ) ⎟ i ⎝ ⎠
e. ?==================================================== ? Application 16.1 ?==================================================== Namelist ; x = one,age,educ,hsat,female,married $ LOGIT ; Lhs = Doctor ; Rhs = X $ Calc ; L1 = logl $ ++
113
 Binary Logit Model for Binary Choice   Dependent variable DOCTOR   Number of observations 27326   Log likelihood function 16405.94   Number of parameters 6   Info. Criterion: AIC = 1.20120   Info. Criterion: BIC = 1.20300   Restricted log likelihood 18019.55  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ +Characteristics in numerator of Prob[Y = 1] Constant 1.82207669 .10763712 16.928 .0000 AGE  .01235692 .00124643 9.914 .0000 43.5256898 EDUC  .00569371 .00578743 .984 .3252 11.3206310 HSAT  .29276744 .00686076 42.673 .0000 6.78542607 FEMALE  .58376753 .02717992 21.478 .0000 .47877479 MARRIED  .03550015 .03173886 1.119 .2634 .75861817 f. Matr ; bw = b(5:6) ; vw = varb(5:6,5:6) $ Matrix ; list ; WaldStat = bw'bw $ Calc ; list ; ctb(.95,2) $ LOGIT ; Lhs = Doctor ; Rhs = One,age,educ,hsat $ Calc ; L0 = logl $ Calc ; List ; LRStat = 2*(l1l0) $ Matrix WALDSTAT has 1 rows and 1 columns. 1 +1 461.43784 > Calc ; list ; ctb(.95,2) $ ++  Listed Calculator Results  ++ Result = 5.991465 > Calc ; L0 = logl $ > Calc ; List ; LRStat = 2*(l1l0) $ ++  Listed Calculator Results  ++ LRSTAT = 467.336374 Logit ; Lhs = Doctor ; Rhs = X ; Start = b,0,0 ; Maxit = 0 $ ++  Binary Logit Model for Binary Choice   Maximum Likelihood Estimates   Model estimated: May 17, 2007 at 11:49:42PM.  Dependent variable DOCTOR   Weighting variable None   Number of observations 27326   Iterations completed 1   LM Stat. at start values 466.0288   LM statistic kept as scalar LMSTAT   Log likelihood function 16639.61   Number of parameters 6   Info. Criterion: AIC = 1.21830   Finite Sample: AIC = 1.21830   Info. Criterion: BIC = 1.22010   Info. Criterion:HQIC = 1.21888   Restricted log likelihood 18019.55   McFadden Pseudo Rsquared .0765802   Chi squared 2759.883   Degrees of freedom 5 
114
 Prob[ChiSqd > value] = .0000000   HosmerLemeshow chisquared = 23.44388   Pvalue= .00284 with deg.fr. = 8  ++
g. The restricted log likelihood given with the initial results equals 18019.55. This is the log likelihood for a model that contains only a constant term. The log likelihood for the model is 16405.94. Twice the difference is about 3,200, which vastly exceeds the critical chi squared with 5 degrees of freedom. The hypothesis would be rejected.
2. We used LIMDEP to fit the cost frontier. The dependent variable is log(Cost/Pfuel). The regressors are a constant, log(Pcapital/Pfuel), log(Plabor/Pfuel), logQ and log2Q. The Jondrow measure was then computed and plotted against output. There does not appear to be any relationship, though the weak relationship such as it is, is indeed, negative. ++  Limited Dependent Variable Model  FRONTIER   Dependent variable LCF   Number of observations 123   Log likelihood function 66.86502   Variances: Sigmasquared(v)= .01185   Sigmasquared(u)= .02233   Sigma(v) = .10884   Sigma(u) = .14944   Sigma = Sqr[(s^2(u)+s^2(v)]= .18488   Stochastic Cost Frontier, e=v+u.  ++ +++++++ Variable  Coefficient  Standard Error b/St.Er.P[Z>z]  Mean of X +++++++ Primary Index Equation for Model Constant 7.494211759 .30737742 24.381 .0000 LPK .5531289074E01 .70211904E01 .788 .4308 .88666047 LPL .2605889758 .67708437E01 3.849 .0001 5.5808828 LQ .4109789313 .29495035E01 13.934 .0000 8.1794715 LQ2 .6058235980E01 .43732083E02 13.853 .0000 35.112527 Variance parameters for compound error Lambda 1.373117163 .33353523 4.117 .0000 Sigma .1848750589 .28257115E01 6.543 .0000
115
.40
COSTEFF
.30
.20
.10
.00 0
15000
30000
45000
60000
75000
Q
116
Chapter 17 Simulation Inference
Based
Estimation
and
Exercises 1. Exponential: The pdf is f(x) = θexp(θx). The CDF is x ⎡ 1 ⎛ 1 ⎞⎤ F ( x) = ∫ θ exp(−θt )dt = θ ⎢ − exp(−θx) − ⎜ − exp(−θ0) ⎟ ⎥ = 1 − exp(−θx). 0 ⎝ θ ⎠⎦ ⎣ θ
We would draw observations from the U(0,1) population, say Fi, and equate these to F(xi). Inverting the function, we find that 1Fi = exp(θxi), or –(1/θ)ln(1Fi) = xi. If xi has an exponential density, then the density of yi = xiP is Weibull. If the survival function is S(x) = λpexp[(λx)p], then we may equate random draws from the uniform distribution, Si to this function (a draw of Si is the same as a draw of Fi = 1Si). Solving for xi, we find lnSi = ln(λp) – (λx)p, so xi = (1/λ)[ln(λp) – lnSi]1/p. 2. We will need a bivariate sample on x and y to compute the random variable, then average the draws on it. The precise method of using a Gibbs sampler to draw this bivaraite sample is shown in Example 18.5. Once the bivariate sample of (x,y) is drawn, a large number of observations on [x2exp(y)+y2exp(x)] is computed and averaged. As noted there, the Gibbs sampler is not much of a simplification for this particular problem. It is simple to draw a sample dircectly from a bivariate normal distribution. Here is a program that does the simulation and plots the estimate of the function Calc ; Ran(12345) $ Sample ; 11000$ Create ; xf=rnn(0,1) ; yfb=rnn(0,1) $ Matrix ; corr=init(100,1,0) ; function=corr $ Calc ; i=0 $ Proc Calc ; i=i+1 $ Matrix ; corr(i)=ro $ Matrix ; c=[1/ro,1] ; c=chol(c) $ Create ; yf = c(2,1)*xf + c(2,2)*yfb $ Create ; fr=xf^2*exp(yf)+yf^2*exp(xf) $ Calc ; ef = xbr(fr) ; ro=ro+.02 $ Matrix ; function(i)=ef $ Endproc $ Calc ; ro=.99 $ Execute; n=100 $ Mplot ; Lhs = corr ; Rhs = Function ; Fill ; Grid ; Endpoints = 1,1 ; Title=E[x^2*exp(y)+y^2*exp(x)  rho] $
117
Application ?================================================================ ? Application 17.1. Monte Carlo Simulation ?================================================================ ? Set seed of RNG for replicability Calc ; Ran(123579) $ ? Sample size is 50. Generate x(i) and z(i) held fixed Sample ; 1  50 $ Create ; xi = rnn(0,1) ; zi = rnn(0,1) $ Namelist ; X = one,xi,zi ; X0 = one,xi $ ? Moment Matrices Matrix ; XXinv = ; X0X0inv = $ Matrix ; Waldi = init(1000,1,0) $ Matrix ; LMi = init(1000,1,0) $ ?**************************************************************** ? Procedure studies the LM statistic ?**************************************************************** Proc = LM (c) $ ? Three kinds of disturbances Create ?; Eps = Rnt(5) ? Nonnormal distribution ; vi=exp(.2*xi) ; eps = vi*rnn(0,1) ? Heteroscedasticity ?;eps= Rnn(0,1) ? Standard normal distribution ; y = 0 + xi + c*zi +eps $ Matrix ; b0 = X0X0inv*X0'y $ Create ; e0 = y  X0'b0 $ Matrix ; g = X'e0 $ Calc ; lmstat = qfr(g,xxinv)/(e0'e0/n) ; i = i + 1 $ Matrix ; Lmi (i) = lmstat $ EndProc $
118
Calc ; i = 0 ; gamma = 1 $ Exec ; Proc=LM(gamma) ; n = 1000 $ samp;11000$ create;LMv=lmi $ create;reject=lmv>3.84$ Calc ; List ; Type1 = xbr(reject) ; pwr = 1Type1 $ ?**************************************************************** ? Procedure studies the Wald statistic ?**************************************************************** Proc = Wald(c) $ Create ; if(type=1)Eps = Rnn(0,1) ? Standard normal distribution ; if(type=2)vi=exp(.2*xi) ? eps = vi*rnn(0,1) ? Heteroscedasticity ; if(type=3)eps= Rnt(5) ? Nonnormal distribution ; y = 0 + xi + c*zi +eps $ Matrix ; b0=XXinv*X'y $ Create ; e0=yX'b0$ Calc ; ss0 = e0'e0/(47) ; v0 = ss0*xxinv(3,3) ; wald0=(b0(3))^2/v0 ; i=i+1 $ Matrix ; Waldi(i)=Wald0 $ EndProc $ ? Set the values for the simulation Calc ; i = 0 ; gamma = 0 ; type=1 $ Sample ; 150 $ Exec ; Proc=Wald(gamma) ; n = 1000 $ samp;11000$ create;Waldv=Waldi $ create;reject=Waldv > 3.84$ Calc ; List ; Type1 = xbr(reject) ; pwr = 1Type1 $
To carry out the simulation, execute the procedure for different values of “gamma” and “type.” Summarize the results with a table or plot of the rejection probabilities as a function of gamma.
119
Chapter 18 Bayesian Estimation and Inference Exercise a. The likelihood function is L(yλ) =
∏ i =1 f ( yi  λ) = ∏ i =1 n
n
n exp(−λ)λ yi 1 = exp(−nλ)λ Σi yi ∏ i =1 . Γ( yi + 1) Γ( yi + 1)
b. The posterior is
p (λ  y1 ,..., yn ) =
p ( y1 ,..., yn  λ ) p (λ ) ∞
∫
0
p ( y1 ,..., yn  λ ) p (λ ) d λ
.
The product of factorials will fall out. This leaves
p(λ  y1 ,..., yn ) =
exp(−nλ)λ Σi yi (1/ λ)
∫
= = =
∞ 0
∫
exp(−nλ)λ Σi yi (1/ λ)d λ ∞ 0
exp(−nλ)λ (
Σi yi ) −1
exp( −nλ)λ (
Σi yi ) −1
dλ
exp(−nλ)λ ny −1
∫
∞ 0
exp( −nλ)λ ny −1d λ
n ny exp(−nλ)λ ny −1 . Γ (ny )
where we have used the gamma integral at the last step. The posterior defines a two parameter gamma distribution, G(n, ny ). c. The estimator of λ is the mean of the posterior. There is no need to do the integration. This falls simply out of the posterior density, E[λy] = ny /n = y . d. The posterior variance also drops out simply; it is ny /n2 = y /n.
120
Application ⎛ Ki ⎞ Fi Ki − Fi so the log likelihood function is ⎟ θ (1 − θ) F ⎝ i⎠
a. p(FiKi,θ) = ⎜
⎛ Ki ⎞ n ln L(θ  y ) = ∑ i =1 ln ⎜ ⎟ + Fi ln θ + ( Ki − Fi )ln(1 − θ) ⎝ F, ⎠ The MLE is obtained by setting ∂lnL(θy)/∂θ = Σi [Fi/θ  (KiFi)/(1θ)] = 0. Multiply both sides by θ(1θ) to obtain Σi [(1θ)Fi  θ (KiFi)] = 0 A line of algebra reveals that the solution is θ = (ΣiFi)/(ΣiKi) = 0.651596.
b. The posterior density is
∫
1 0
⎡ n ⎢∏ i =1 ⎣ ⎡ n ⎢∏ i =1 ⎣
⎤ Γ(a + b) a −1 ⎛ K i ⎞ Fi K i − Fi θ (1 − θ)b −1 ⎥ ⎜ ⎟ θ (1 − θ) F a b ( ) ( ) Γ Γ ⎝ i⎠ ⎦ ⎤ Γ(a + b) a −1 ⎛ K i ⎞ Fi K i − Fi θ (1 − θ)b −1 d θ ⎥ ⎜ ⎟ θ (1 − θ) F a b ( ) ( ) Γ Γ ⎝ i⎠ ⎦
This simplifies considerably. The combinatorials and gamma functions fall out, leaving
p (θ  y ) =
=
∫
1 0
∫
1 0
⎡ n θ Fi (1 − θ) Ki − Fi ⎤ θa −1 (1 − θ)b −1 ⎣∏ i =1 ⎦ = n F K F − i i i ⎡∏ θ (1 − θ) ⎤ θa −1 (1 − θ)b −1 d θ ⎣ i =1 ⎦ ( Σi Fi ) + ( a −1) [ Σi ( K i − Fi )]+ ( b −1) ⎡⎣ θ ⎤⎦ (1 − θ)
∫
1 0
⎡⎣θΣi Fi (1 − θ)Σi ( Ki − Fi ) ⎤⎦ θa −1 (1 − θ)b −1 ⎡⎣ θΣi Fi (1 − θ)Σi ( Ki − Fi ) ⎤⎦ θa −1 (1 − θ)b −1 d θ
⎡⎣ θ( Σi Fi ) + ( a −1) (1 − θ)Σi ( Ki − Fi )]+ ( b −1) ⎤⎦ d θ
The denominator is a beta integral, so the posterior density is
p (θ  y ) =
Γ[(Σi Fi ) + (a − 1)]Γ[(Σi ( Ki − Fi )) + (b − 1)] ( Σi Fi ) + ( a −1) ⎡θ (1 − θ)[ Σi ( Ki − Fi )]+ (b −1) ⎤⎦ Γ[(Σi Fi ) + (a − 1) + (Σi ( Ki − Fi )) + (b − 1)] ⎣
The denominator simplifies slightly;
p (θ  y ) = =
Γ[(Σi Fi ) + (a − 1)]Γ[(Σi ( K i − Fi )) + (b − 1)] ( Σi Fi ) + ( a −1) ⎡⎣ θ (1 − θ)[ Σi ( Ki − Fi )]+ ( b −1) ⎤⎦ Γ[(Σi K i ) + ( a − 1) + (b − 1)] Γ[(a + Σi Fi ) − 1)]Γ[(b + Σi ( K i − Fi )) − 1)] ( a +Σi Fi ) −1 ⎡⎣ θ (1 − θ)[ b +Σi ( Ki − Fi )]−1 ⎤⎦ Γ[( a + b) + (Σi K i ) − 1 − 1)]
ce. The posterior distribution is a beta distribution with parameters a*=(a+ΣiFi) and b*=[b+Σi(KiFi)]. The mean of this beta random variable is a*/(a*+b*) = (a+ΣiFi)/(a+b+ΣiKi). In the data, Σi = 49 and ΣiKi = 75. For the values given, the posterior means are (a=1,b=1): Result (a=2,b=2): Result (a=1,b=2): Result
= = =
.647668 .643939 .639386
121
Chapter 19 ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Serial Correlation Exercises 1. For the first order autoregressive model, the autocorrelation is ρ. Consider the first difference, vt = εt  εt1 which has Var[vt] = 2Var[εt]  2Cov[(εt,εt1)] = 2σu2[1/(1  ρ2)  ρ/(1  ρ2)] = 2σu2/(1 + ρ) and Cov[vt,vt1] = 2Cov[εt,εt1]  Var[εt]  Cov[εt,εt1] = σu2[1/(1  ρ2)][2ρ  1  ρ2] = σu2[(ρ  1)/(1 + ρ)]. Therefore, the autocorrelation of the differenced process is Cov[vt,vt1] / Var[vt] = (ρ  1) / 2. As the figure below on the left shows, first differencing reduces the absolute value of the autocorrelation coefficient when ρ is greater than 1/3. For economic data, this is likely to be fairly common.
For the moving average process, the first order autocorrelation is Cov[(εt,εt1)]/Var[εt] = λ/(1 + λ2). To obtain the autocorrelation of the first difference, write εt  εt1 = ut  (1 + λ)ut1 + λut2 and εt1  εt2 = ut1  (1 + λ)ut2 + λut3. The variance of the difference is Var[εt  εt1] = σu2[(1 + λ)2 + (1 + λ2)]. The covariance can be found by taking the expected product of terms with equal subscripts. Thus, Cov[εt  εt1,εt1  εt2] = σu2(1 + λ)2. The autocorrelation is Cov[εt  εt1,εt1  εt2]/Var[εt  εt1] =  (1 + λ)2/[(1 + λ)2 + (1 + λ2)]. A plot of the relationship between the differenced and undifferenced series is shown in the right panel above. The horizontal axis plots the autocorrelation of the original series. The values plotted are the absolute values of the difference between the autocorrelation of the differenced series and the original series. The results are similar to those for the AR(1) model. For most of the range of the autocorrelation of the original series, differencing increases autocorrelation. But, for most of the range of values that are economically meaningful, differencing reduces autocorrelation. 2. Derive the disturbance covariance matrix for the model yt = β′xt + εt, εt = ρεt1 + ut  λut1. What parameter is estimated by the regression of the ordinary least squares residuals on their lagged values? Solve the disturbance process in its moving average form. Write the process as εt  ρεt1 = ut  λut1 or, using the lag operator, εt(1  ρL) = ut  λut1 or εt = ut/(1  ρL)  λut1/(1  ρL). After multiplying these = ut + ρut1 + ρ2ut2 + ρ3ut3 + ...  λut1  ρλut2  ρ2λut3  ... out, we obtain εt = ut + (ρλ)ut1 + ρ(ρλ)ut2 + ρ2(ρλ)ut3 + ... Therefore, Var[εt] = σu2(1 + (ρλ)2)(1 + ρ2 + ρ4 + ...) = σu2(1 + (ρλ)2/(1  ρ2)) = σu2(1 + λ2  2ρλ)/(1  ρ2) Cov[εt,εt1] = ρVar[εt1] + Cov[εt1,ut]  λCov[εt1,ut1]. To evaluate this expression, write
122
εt1 = ut1 + (ρλ)ut2 + ρ(ρλ)ut3 + ρ2(ρλ)ut4+ ... Therefore, the middle term is zero and the third is simply λσu2. Thus, Cov[εt,εt1] = σu2{[ρ(1 + λ2  2ρλ)]/(1  ρ2)  λ]} = σu2[(ρ  λ)(1  λρ)/(1  ρ2)] For lags greater than 1, Cov[εt,εtj] = ρCov[εt1,εtj] + Cov[εtj,ut]  λCov[εtj,ut1]. Since εtj involves only us up to its current period, εtj is uncorrelated with ut and ut1 if j is greater than 1. Therefore, after the first lag, the autocovariances behave in the familiar fashion, Cov[εt,εtj] = ρCov[εt,εtj+1] The autocorrelation coefficient of the residuals estimates Cov[εt,εt1]/Var[εt] = (ρ  λ)(1  ρλ)/(1 + λ2  2ρλ). 3. Since the regression contains a lagged dependent variable, we cannot use the DurbinWatson statistic directly. The h statistic in (1534) would be h = (1  1.21/2)[21 / (1  21(.182)]1/2 = 3.201. The 95% critical value from the standard normal distribution for this onetailed test would be 1.645. Therefore, we would reject the hypothesis of no autocorrelation. 4. It is commonly asserted that the DurbinWatson statistic is only appropriate for testing for first order autoregressive disturbances. What combination of the coefficients of the model is estimated by the DurbinWatson statistic in each of the following cases: AR(1), AR(2), MA(1)? In each case, assume that the regression model does not contain a lagged dependent variable. Comment on the impact on your results of relaxing this assumption. In each case, plim d = 2  2ρ1 where ρ1 = Corr[εt,εt1]. The first order autocorrelations are as follows: AR(1): ρ (see (159)) and AR(2): θ1/(1  θ2). For the AR(2), a proof is as follows: First, εt = θ1εt1 + θ2εt2 + ut. Denote Var[εt] as c0 and Cov[εt,εt1] as c1. Then, it follows immediately that c1 = θ1c0 + θ2c1 since ut is independent of εt1. Therefore ρ1 = c1/c0 = θ1/(1  θ2). For the MA(1): λ / (1 + λ2) (See (1543)). To prove this, write εt = ut  λut1. Then, since the us are independent, the result follows just by multiplying out ρ1 = Cov[εt,εt1]/Var[εt] = λVar[ut1]/{Var[ut] + λ2Var[ut1]} = λ/(1 + λ2).
Applications 1.
Phillips Curve
> > > > > >
date;1950.1$ peri;1950.12000.4$ crea;dp=inflinfl[1]$ crea;dy=loggdploggdp[1]$ peri;1950.32000.4$ regr;lhs=dp;rhs=one,unemp$;ar1;res=u$
++  Ordinary least squares regression Weighting variable = none   Dep. var. = DP Mean= .1926996283E01, S.D.= 2.818214558   Model size: Observations = 202, Parameters = 2, Deg.Fr.= 200   Residuals: Sum of squares= 1592.321197 , Std.Dev.= 2.82163   Fit: Rsquared= .002561, Adjusted Rsquared = .00243   Model test: F[ 1, 200] = .51, Prob value = .47449   Diagnostic: LogL = 495.1583, Restricted(b=0) LogL = 495.4173   LogAmemiyaPrCrt.= 2.084, Akaike Info. Crt.= 4.922   Autocorrel: DurbinWatson Statistic = 2.82755, Rho = .41378  ++ +++++++ Variable  Coefficient  Standard Error tratio P[T>t]  Mean of X +++++++ Constant .4918922148 .74047944 .664 .5073 UNEMP .9013159906E01 .12578616 .717 .4745 5.6712871 > peri;1951.22000.4$ > regr;lhs=u;rhs=one,u[1],u[2]$
123
++  Ordinary least squares regression Weighting variable = none   Dep. var. = U Mean= .3890391012E01, S.D.= 2.799476915   Model size: Observations = 199, Parameters = 3, Deg.Fr.= 196   Residuals: Sum of squares= 1079.052269 , Std.Dev.= 2.34635   Fit: Rsquared= .304618, Adjusted Rsquared = .29752   Model test: F[ 2, 196] = 42.93, Prob value = .00000   Diagnostic: LogL = 450.5769, Restricted(b=0) LogL = 486.7246   LogAmemiyaPrCrt.= 1.721, Akaike Info. Crt.= 4.559   Autocorrel: DurbinWatson Statistic = 1.99273, Rho = .00363  ++ +++++++ Variable  Coefficient  Standard Error tratio P[T>t]  Mean of X +++++++ Constant .5048615289E01 .16633422 .304 .7618 U[1] .5946344724 .65920584E01 9.020 .0000 .10234931E01 U[2] .3824653303 .65904378E01 5.803 .0000 .14370453E01 (Note: E+nn or Enn means multiply by 10 to + or nn power.) > calc;list;lm=n*rsqrd$ LM = .60618960968412850D+02 ++  AR(1) Model: e(t) = rho * e(t1) + u(t)   Initial value of rho = .41378   Maximum iterations = 100   Method = Prais  Winsten   Iter= 1, SS= 1299.275, LogL=474.710175   Final value of Rho = .413779   Iter= 1, SS= 1299.275, LogL=474.710175   DurbinWatson: e(t) = 2.827557   Std. Deviation: e(t) = 2.799716   Std. Deviation: u(t) = 2.548799   DurbinWatson: u(t) = 2.340706   Autocorrelation: u(t) = .170353   N[0,1] used for significance levels  ++ +++++++ Variable  Coefficient  Standard Error b/St.Er.P[Z>z]  Mean of X +++++++ Constant .4704274598 .47671946 .987 .3237 UNEMP .8709854633E01 .80962277E01 1.076 .2820 5.6712871 RHO .4137785986 .64213081E01 6.444 .0000
Regression results are almost unchanged. Autocorrelation of transformed residuals is .17, less than .41 in original model.
124
2. (Improved Phillips curve model) > crea;newecon=dmy(1974.1,2000.4)$ > regr;lhs=dp;rhs=one,unemp,newecon;plot$ ++  Ordinary least squares regression Weighting variable = none   Dep. var. = DP Mean= .1926996283E01, S.D.= 2.818214558   Model size: Observations = 202, Parameters = 3, Deg.Fr.= 199   Residuals: Sum of squares= 1586.260338 , Std.Dev.= 2.82332   Fit: Rsquared= .006357, Adjusted Rsquared = .00363   Model test: F[ 2, 199] = .64, Prob value = .53017   Diagnostic: LogL = 494.7731, Restricted(b=0) LogL = 495.4173   LogAmemiyaPrCrt.= 2.091, Akaike Info. Crt.= 4.928   Autocorrel: DurbinWatson Statistic = 2.83473, Rho = .41737  ++ +++++++ Variable  Coefficient  Standard Error tratio P[T>t]  Mean of X +++++++ Constant .5507626279 .74399306 .740 .4600 UNEMP .9835166981E01 .12621412 .779 .4368 5.6712871 NEWECON 2.474910396 2.8382661 .872 .3843 .49504950E02
3. (GARCH Models) .a. We used LIMDEP with the macroeconomics data in table F5.1. The rate of inflation was computed with all observations, then observations 6 to 204 were used to remove the missing data due to lags. Least squares results were obtained first. The residuals were then computed and squared. Using observations 15204, we then computed a regression of the squared residual on a constant and 8 lagged values. The chisquared statistic with 8 degrees of freedom is 28.24. The critical value from the table for 95% significance and 8 degrees of freedom is 15.51, so at this level of significance, the hypothesis of no GARCH effects is rejected. crea;pt=100*log(cpi_u/cpi_u[1])$ crea;pt1=pt[1];pt2=pt[2];pt3=pt[3];pt4=pt[4]$ samp;6204$ regr;lhs=pt;rhs=one,pt1,pt2,pt3,pt4;res=et$$ crea;vt=et*et$ crea;vt1=vt[1];vt2=vt[2];vt3=vt[3];vt4=vt[4];vt5=vt[5];vt6=vt[6];vt7=vt[7];vt8=vt[8]$ samp;15204$ regr;lhs=vt;rhs=one,vt1,vt2,vt3,vt4,vt5,vt6,vt7,vt8$ calc;list;lm=n*rsqrd$ ++  Ordinary least squares regression Weighting variable = none   Dep. var. = PT Mean= .9589185961 , S.D.= .8318268241   Model size: Observations = 199, Parameters = 5, Deg.Fr.= 194   Residuals: Sum of squares= 61.97028507 , Std.Dev.= .56519   Fit: Rsquared= .547673, Adjusted Rsquared = .53835   Model test: F[ 4, 194] = 58.72, Prob value = .00000   Diagnostic: LogL = 166.2871, Restricted(b=0) LogL = 245.2254   LogAmemiyaPrCrt.= 1.116, Akaike Info. Crt.= 1.721   Autocorrel: DurbinWatson Statistic = 1.80740, Rho = .09630  ++ +++++++ Variable  Coefficient  Standard Error tratio P[T>t]  Mean of X +++++++ Constant .1296044455 .67521735E01 1.919 .0564 PT1 .2856136998 .69863942E01 4.088 .0001 .97399582 PT2 .1237760914 .70647061E01 1.752 .0813 .98184918 PT3 .2516837602 .70327318E01 3.579 .0004 .99074774 PT4 .1824670634 .69251374E01 2.635 .0091 .98781131 LM = .28240022492847690D+02
125
For the second step, we need an estimate of α0, which is the unconditional variance if there are no ARCH effects. We computed this based on the ARCH specification by a regression of et2 – (8/36)et12  … (1/36)et82 on just a constant term. This produces a negative estimate of α0, but this is not the variance, so we retain the result. We note, the problem that this reflects is probably the specific, doubtless unduly restrictive, ARCH structure assumed. samp;6204$ crea;vt=et*et$ crea;ht=vt8/36*vt[1]7/36*vt[2]6/36*vt[3]5/36*vt[4]4/36*vt[5]3/36*vt[6]2/36*vt[7]1/36*vt[8]$ samp;15204$ calc;list;a0=xbr(ht)$ samp;6204$ crea;qt=a0+8/36*vt[1]+7/36*vt[2]+6/36*vt[3]+5/36*vt[4]+4/36*vt[5]+3/36*vt[6]+2/36*vt[7]+1/36*vt[8]$ samp;15204$ plot;rhs=qt$ crea;wt=1/qt$ regr;lhs=pt;rhs=one,pt1,pt2,pt3,pt4;wts=wt$ regr;lhs=pt;rhs=one,pt1,pt2,pt3,pt4;model=garch(1,1)$
Once we have an estimate of α0 in hand, we then computed the set of variances according to the ARCH(8) model, using the lagged squared residuals. Finally, we used these variance estimators to compute a weighted least squares regression accounting for the heteroscedasticity. This regression is based on observations 15204, again because of the lagged values. Finally, using the same sample, a GARCH(1,1) model is fit by maximum likelihood. ++  Ordinary least squares regression Weighting variable = WT   Dep. var. = PT Mean= .8006997687 , S.D.= .6327877239   Model size: Observations = 190, Parameters = 5, Deg.Fr.= 185   Residuals: Sum of squares= 38.67492770 , Std.Dev.= .45722   Fit: Rsquared= .488964, Adjusted Rsquared = .47791   Model test: F[ 4, 185] = 44.25, Prob value = .00000   Diagnostic: LogL = 147.7324, Restricted(b=0) LogL = 211.5074   LogAmemiyaPrCrt.= 1.539, Akaike Info. Crt.= 1.608   Autocorrel: DurbinWatson Statistic = 1.90310, Rho = .04845  +++++++ Variable  Coefficient  Standard Error tratio P[T>t]  Mean of X +++++++ Constant .1468553158 .60127085E01 2.442 .0155 PT1 .9760051110E01 .88469908E01 1.103 .2714 .77755556 PT2 .3328520370 .86772549E01 3.836 .0002 .76745308 PT3 .1428889148 .85420554E01 1.673 .0961 .76271761 PT4 .2878686524 .84090832E01 3.423 .0008 .74173558
The 8 period ARCH model produces quite a substantial change in the estimates. Once again, this probably results from the restrictive assumption about the lag weights in the ARCH model. The GARCH model follows.
126
++  GARCH MODEL   Maximum Likelihood Estimates   Model estimated: Jul 31, 2002 at 01:19:14PM.  Dependent variable PT   Weighting variable None   Number of observations 190   Iterations completed 22   Log likelihood function 135.5043   Restricted log likelihood 147.6465   Chi squared 24.28447   Degrees of freedom 2   Prob[ChiSqd > value] = .5328953E05   GARCH Model, P = 1, Q = 1   Wald statistic for GARCH = 521.483  ++ +++++++ Variable  Coefficient  Standard Error b/St.Er.P[Z>z]  Mean of X +++++++ Regression parameters Constant .1308478127 .61887183E01 2.114 .0345 PT1 .1749239917 .70912277E01 2.467 .0136 .98810078 PT2 .2532191617 .73228319E01 3.458 .0005 .98160455 PT3 .1552879436 .68274176E01 2.274 .0229 .97782066 PT4 .2751467919 .63910272E01 4.305 .0000 .97277700 Unconditional Variance Alpha(0) .1005125676E01 .11653271E01 .863 .3884 Lagged Variance Terms Delta(1) .8556879884 .89322732E01 9.580 .0000 Lagged Squared Disturbance Terms Alpha(1) .1077364862 .60761132E01 1.773 .0762 Equilibrium variance, a0/[1D(1)A(1)] EquilVar .2748082674 2.0559946 .134 .8937
127
Chapter 20 ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Models with Lagged Variables Exercises 1. For the first, the mean lag is .55(.02)(0) + .55(.15)(1) + ... + .55(.17)(4) = 1.31 periods. The impact multiplier is .55(.02) = .011 while the long run multiplier is the sum of the coefficients, .55. For the second, the coefficient on xt is .6, so this is the impact multiplier. The mean lag is found by applying (189) to B(L) = [.6 + 2L]/[1  .6L + .5L2] = A(L)/D(L). Then, B′(1)/B(1) = {[D(1)A′(1)  A(1)D′(1)]/[D(1)]2} / [A(1)/D(1)] = A′(1)/A(1)  D′(1)/D(1) = (2/2.6) / (.4/.9) = 1.731 periods. The long run multiplier is B(1) = 2.6/.9 = 2.888 periods. For the third, since we are interested only in the coefficients on xt, write the model as yt = α + βxt[1 + γL + γ2L2 + ...] + δzt* + ut. The lag coefficients on xt are simply β times powers of γ. 2. We would regress yt on a constant, xt, xt1, ..., xt6. 1 5 10 10 5 1 0 R = 0 1 5 10 10 5 1 0 0 1 5 10 10 5 would produce the PDL estimates.
Constrained least squares using 0 0 0 , q = 0 0 1
3. The ratio of polynomials will equal B(L) = [.6 + 2L]/[1  .6L + .5L2]. This will expand to B(L) = β0 + β1L + β2L2 + .... Multiply both sides of the equation by (1  .6L + .5L2) to obtain (β0 + β1L + β2L2 + ....)(1  .6L + .5L2) = .6 + 2L. Since the two sides must be equal, it follows that β0 = .6 (the only term not involving L) .6β0 + β1 = 2 (the only term involving only L. Therefore, β1 = 2.36. All remaining terms, involving L2, L3, ... must equal zero. Therefore, βj  .6βj1 + .5βj2 = 0 for all j > 1, or βj = .6βj1  .5βj2. This provides a recursion for all remaining coefficients. For the specified coefficients, β2 = .6(2.36)  .5(.3) = 1.266. β3 = .6(1.266)  .5(2.36) = .4204, β4 = .6(.4204)  .5(1.266) = .88524 and so on. 4. By multiplying through by the denominator of the lag function, we obtain an autoregressive form = α(1+δ1+δ2) + βxt + γxt1  δ1yt1  δ2yt2 + εt + δ1εt1 + δ2εt2 yt = α(1+δ1+δ2) + βxt + γxt1  δ1yt1  δ2yt2 + vt The model cannot be estimated consistently by ordinary least squares because there is autocorrelation in the presence of a lagged dependent variable. There are two approaches possible. Nonlinear least squares could be applied to the moving average (distributed lag) form. This would be fairly complicated, though a method of doing so is described by Maddala and Rao (1973). A much simpler approach would be to estimate the model in the autoregressive form using an instrumental variables estimator. The lagged variables xt2 and xt3 can be used for the lagged dependent variables. ~ 5. The model can be estimated as an autoregressive or distributed lag equation. Consider, first, the autoregressive form. Multiply through by (1  γL)(1  φL) to obtain yt = α(1γ)(1φ) + βxt  (βφ)xt1 + δzt  (δγ)zt1 + (γ + φ)yt1  (γφ)yt2 + εt (γ+φ)εt1 + (γφ)εt2. Clearly, the model cannot be estimated by ordinary least squares, since there is an autocorrelated disturbance and a lagged dependent variable. The parameters can be estimated consistently, but inefficiently by linear instrumental variables. The inefficiency arises from the fact that the parameters are overidentified. The linear estimator estimates seven functions of the five underlying parameters. One possibility is a GMM estimator. Let vt = εt (γ+φ)εt1 + (γφ)εt2. Then, a GMM estimator can be defined in terms of, say, a set of moment equations of the form E[vtwt] = 0, where wt is current and lagged values of x and z. A minimum distance estimator could then be used for estimation.
128
The distributed lag approach might be taken, instead. Each of the two regressors produces a recursions xt* = xt + γxt1* and zt* = zt + γzt1*. Thus, values of the moving average regressors can be built up recursively. Note that the model is linear in 1, xt*, and zt*. Therefore, an approach is to search a grid of values of (γ,φ) to minimize the sum of squares. ~
Applications 1. The long run multiplier is β0 + β1 + ... + β6. The model is a classical regression, so it can be estimated by ordinary least squares. The estimator of the long run multiplier would be the sum of the least squares coefficients. If the sixth lag is omitted, then the standard omitted variable result applies, and all the coefficients are biased. The orthogonality result needed to remove the bias explicitly fails here, since xt is an AR(1) process. All the lags are correlated. Since the form of the relationship is, in fact, known, we can derive the omitted variable formula. In particular, by construction, xt will have mean zero. By implication, yt will also, so we lose nothing by assuming that the constant term is zero. To save some cumbersome algebra, we’ll also assume with no loss of generality that the unconditional variance of xt is 1. Let X1 = [xt,xt1,...,xt5] and X2 = xt6. Then, for the regression of y on X1, we have by the omitted variable formula,
⎡b0 ⎤ ⎡ β0 ⎤ ⎡ 1 ⎢b ⎥ ⎢β ⎥ ⎢ ⎢ 1 ⎥ ⎢ 1⎥ ⎢ r ⎢b ⎥ ⎢ β ⎥ ⎢r 2 E ⎢ 2  X1 ⎥ = ⎢ 2 ⎥ + ⎢ 3 ⎢ b3 ⎥ ⎢ β3 ⎥ ⎢ r ⎢b4 ⎥ ⎢ β 4 ⎥ ⎢r 4 ⎢ ⎥ ⎢ ⎥ ⎢ 5 ⎣⎢b5 ⎦⎥ ⎣⎢ β 5 ⎦⎥ ⎣⎢ r
r 1 r r2 r3 r4
r2 r 1 r r2 r3
r3 r2 r 1 r r2
r4 r3 r2 r 1 r
r5 ⎤ ⎥ r4 ⎥ r3 ⎥ ⎥ r2 ⎥ r⎥ ⎥ 1 ⎦⎥
−1
⎡r 6 ⎤ ⎢ 5⎥ ⎢r ⎥ ⎢r 4 ⎥ ⎢ 3 ⎥ β6 ⎢r ⎥ ⎢r 2 ⎥ ⎢ ⎥ ⎣⎢ r ⎦⎥
We can derive a formal solution to the bias in this estimator. Note that the column that is to the right of the inverse matrix is r times the last column matrix. Therefore, the matrix product is r times the last column of an identity matrix. This gives us the complete result,
⎡b0 ⎤ ⎡ β 0 ⎤ ⎡0 ⎤ ⎢b ⎥ ⎢ β ⎥ ⎢0 ⎥ ⎢ 1 ⎥ ⎢ 1⎥ ⎢ ⎥ ⎢b ⎥ ⎢ β ⎥ ⎢0 ⎥ E ⎢ 2  X1 ⎥ = ⎢ 2 ⎥ + ⎢ ⎥ β6 . ⎢ b3 ⎥ ⎢ β 3 ⎥ ⎢0 ⎥ ⎢b4 ⎥ ⎢ β 4 ⎥ ⎢0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣⎢b5 ⎦⎥ ⎣⎢ β 5 ⎦⎥ ⎣⎢ r ⎦⎥ Therefore, the first 5 coefficients are unbiased, and the last one is an estimator of β5 + rβ6. Adding these up, we see that when the last lag is omitted from the model, the estimator of the long run multiplier is biased downware by (1r)β6. For part d, we will use a similar construction. But, now there are five variables in X1 and xt5 and xt6 in X2. The same kind of computation will show that the first four coefficients are unbiased while the fifth now estimates β4 + rβ5 + r2β6. The long run multiplier is estimated with downward bias equal to (1r)β5 + (1r2)β6. +++++++ Variable  Coefficient  Standard Error tratio P[T>t]  Mean of X +++++++ XT .9726595701 1.9258818 .505 .6141 8.3384522 XT1 .7709686332 3.1555811 .244 .8072 8.3301663 XT2 .5450409860 3.1761465 .172 .8639 8.3218191 XT3 .6061007409 3.1903388 .190 .8495 8.3134324 XT4 .2272352746 3.1729930 .072 .9430 8.3050260 XT5 1.916555094 3.1414210 .610 .5425 8.2964570 XT6 1.218771893 1.8814874 .648 .5179 8.2878393 Matrix LRM has 1 rows and 1 columns. 1 +
129
1 XT XT1 XT2 XT3 XT4 XT5 Matrix LRM
.7575 1.101551478 1.9126777 .576 .5653 8.3384522 .6941982792 3.1485851 .220 .8257 8.3301663 .5287939572 3.1712435 .167 .8677 8.3218191 .7300170198 3.1797815 .230 .8187 8.3134324 .5552651191 3.1275848 .178 .8593 8.3050260 .2826674399 1.8697065 .151 .8800 8.2964570 has 1 rows and 1 columns. 1 +1 .7566 +++++++ Variable  Coefficient  Standard Error tratio P[T>t]  Mean of X +++++++ XT 1.077633667 1.9012923 .567 .5715 8.3384522 XT1 .7070443138 3.1394606 .225 .8221 8.3301663 XT2 .5633400685 3.1549830 .179 .8585 8.3218191 XT3 .6608149939 3.1386871 .211 .8335 8.3134324 XT4 .9304013056 1.8990464 .490 .6247 8.3050260 Matrix LRM has 1 rows and 1 columns. 1 +1 .7568 > calc;list;cor(xt,xt1)$ Result = .99978740920470700D+00
The results of the three suggested regressions are shown above. We used observations 7  204 of the logged real investment and real GDP data in deviations from the means for all regressions. Note that although there are some large changes in the estimated individual parameters, the long run multiplier is almost identical in all cases. Looking at the analytical results we can see why this would be the case. The correlation between current and lagged log gdp is r = 0.9998. Therefore, the biases that we found, (1r)β6 and (1r)β5 + (1r2)β6 are trivial. 2. Because the model has both lagged dependent variables and autocorrelated disturbances, ordinary least squares will be inconsistent. Consistent estimates could be obtained by the method of instrumental variables. We can use xt1 and xt2 as the instruments for yt1 and yt2. Efficient estimates can be obtained by a two step procedure. We write the model as yt  ρyt1 = α(1ρ) + β(xt  ρxt1) + γ(yt1  ρyt2) + δ(yt2  ρyt3) + ut. With a consistent estimator of ρ, we could use FGLS. The residuals from the IV estimator can be used to estimate ρ. Then OLS using the transformed data is asymptotically equivalent to GLS. The method of Hatanaka discussed in the text is another possibility. Using the real consumption and real disposable income data in Table F5.1, we obtain the following results: Estimated standard errors are shown in parentheses. (The estimated autocorrelation based on the IV estimates is .9172.) All three sets of estimates are based on the last 201 observations, 1950.4 to 2000.4 OLS
IV
2 Step FGLS
1.4946 (3.8291)
64.5073 (46.1075)
4.6614 (3.2041)
.007569 (.001662)
.7003 (.4910)
.3477 (.0432)
1.1977 (.006921)
.5726 (.9043)
.2332 (.05933)
0.1988 (.07109)
.3324 (.4962)
.4072 (.05500)
∧
α ∧
β ∧
γ
∧
δ
130
Chapter 21 ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Time Series Models There are no exercises or applications in Chapter 21.
131
Chapter 22 ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Nonstationary Data Exercise 1. The autocorrelations are simple to obtain just by multiplying out vt2, vtvt1 and so on. The autocovariances are 1+θ12 + θ22, θ2(1  θ2), θ2, 0, 0, 0... which provides the autocorrelations by division by the first of these. The partial autocorrelations are messy, and can be obtained by the Yule Walker equations. Alternatively (and much more simply), we can make use of the observation in Section 21.2.3 that the partial autocorrelations for the MA(2) process mirror tha autocorrelations for an AR(2). Thus, the results in Section 21.2.3 for the AR(2) can be used directly.
Applications 1. ADF Test ++  Ordinary least squares regression Weighting variable = none   Dep. var. = R Mean= 8.212678571 , S.D.= .7762719558   Model size: Observations = 56, Parameters = 6, Deg.Fr.= 50   Residuals: Sum of squares= .9651001703 , Std.Dev.= .13893   Fit: Rsquared= .970881, Adjusted Rsquared = .96797   Model test: F[ 5, 50] = 333.41, Prob value = .00000   Diagnostic: LogL = 34.2439, Restricted(b=0) LogL = 64.7739   LogAmemiyaPrCrt.= 3.846, Akaike Info. Crt.= 1.009   Autocorrel: DurbinWatson Statistic = 1.91589, Rho = .04205  ++ +++++++ Variable  Coefficient  Standard Error tratio P[T>t]  Mean of X +++++++ Constant .2565690959 .47172815 .544 .5889 T .4401352136E03 .25092142E02 .175 .8615 32.500000 R1 .9653227410 .48183346E01 20.034 .0000 8.2305357 DR1 .5600009441 .14342088 3.905 .0003 .12321429E01 DR2 .1739775168 .14781417 1.177 .2448 .20535714E01 DR3 .7792177815E03 .11072916 .007 .9944 .11607143E01 (Note: E+nn or Enn means multiply by 10 to + or nn power.) > wald;fn1=b_r11$ ++  WALD procedure. Estimates and standard errors   for nonlinear functions and joint test of   nonlinear restrictions.   Wald Statistic = .51796   Prob. from Chisquared[ 1] = .47171  ++ ++++++ Variable  Coefficient  Standard Error b/St.Er.P[Z>z]  ++++++ Fncn(1) .3467725900E01 .48183346E01 .720 .4717
The unit root hypothesis is definitely not rejected.
132
2. Macroeconomic Model > samp;1204$ > crea;c=log(realcons);y=log(realdpi)$ > crea;c1=c[1];c2=c[2]$ > samp;3204$ > regr;lhs=c;rhs=one,y,c1,c2$ ++  Ordinary least squares regression Weighting variable = none   Dep. var. = C Mean= 7.889033683 , S.D.= .5102401315   Model size: Observations = 202, Parameters = 4, Deg.Fr.= 198   Residuals: Sum of squares= .1519097328E01, Std.Dev.= .00876   Fit: Rsquared= .999710, Adjusted Rsquared = .99971   Model test: F[ 3, 198] =********, Prob value = .00000   Diagnostic: LogL = 672.4019, Restricted(b=0) LogL = 150.2038   LogAmemiyaPrCrt.= 9.456, Akaike Info. Crt.= 6.618   Autocorrel: DurbinWatson Statistic = 1.89384, Rho = .05308  ++ +++++++ Variable  Coefficient  Standard Error tratio P[T>t]  Mean of X +++++++ Constant .8165780259E03 .10779352E01 .076 .9397 Y .7869591065E01 .29020268E01 2.712 .0073 7.9998985 C1 .9680839747 .72732869E01 13.310 .0000 7.8802520 C2 .4701660339E01 .70076193E01 .671 .5030 7.8714299 > crea;e1=e[1];e2=e[3];e3=e[3]$ > crea;e1=e[1];e2=e[2];e3=e[3]$ > regr;lhs=e;rhs=one,e1,e2,e3$ ++  Ordinary least squares regression Weighting variable = none   Dep. var. = E Mean= .6947138134E15, S.D.= .8693502258E02   Model size: Observations = 202, Parameters = 4, Deg.Fr.= 198   Residuals: Sum of squares= .1339943625E01, Std.Dev.= .00823   Fit: Rsquared= .117934, Adjusted Rsquared = .10457   Model test: F[ 3, 198] = 8.82, Prob value = .00002   Diagnostic: LogL = 685.0763, Restricted(b=0) LogL = 672.4019   LogAmemiyaPrCrt.= 9.581, Akaike Info. Crt.= 6.743   Autocorrel: DurbinWatson Statistic = 1.85371, Rho = .07314  ++ +++++++ Variable  Coefficient  Standard Error tratio P[T>t]  Mean of X +++++++ Constant .2437121418E04 .57884755E03 .042 .9665 E1 .2553462753E01 .70917392E01 .360 .7192 .21497022E04 E2 .3385045374 .66904365E01 5.060 .0000 .56959898E04 E3 .6894158132E01 .71101163E01 .970 .3334 .81793147E04 > calc;list;chisq=n*rsqrd$ CHISQ = .23822731697405480D+02 Matrix Result
has 2 rows and 2 columns. 1 2 +1 1.0688 .0000000D+00 2 19.8378 .0000000D+00
Short run multiplier is β = .07869. Long run is β/(1γ1  γ2) = 12.669. (Not very plausible.)
133
3. ADF Test. To carry out the test, the rate of inflation is regressed on a constant, a time trend, the previous year’s value of the rate of inflation, and three lags of the first difference. The test statistic for the ADF is (0.72905344551)/.011230759 = 2.373. The critical value in the lower part of Table 20.4 with about 100 observations is 3.45. Since our value is large than this, it follows that the hypothesis of a unit root cannot be rejected. 4. Reestimated model in example 13.1. > samp;1204$ > crea;ddp1=infl[1]infl[2]$ > crea;ddp2=ddp1[1]$ > crea;ddp3=ddp1[2]$ > crea;dp=infl[1]$ > samp;97204$ > crea;t=trn(1,1)$ > regr;lhs=infl;rhs=one,t,dp,ddp1,ddp2,ddp3$ ++  Ordinary least squares regression Weighting variable = none   Dep. var. = INFL Mean= 4.907672727 , S.D.= 3.617392978   Model size: Observations = 108, Parameters = 6, Deg.Fr.= 102   Residuals: Sum of squares= 608.5020156 , Std.Dev.= 2.44248   Fit: Rsquared= .565403, Adjusted Rsquared = .54410   Model test: F[ 5, 102] = 26.54, Prob value = .00000  ++ +++++++ Variable  Coefficient  Standard Error tratio P[T>t]  Mean of X +++++++ Constant 2.226039717 1.1342702 1.963 .0524 T .1836785769E01 .11230759E01 1.635 .1050 54.500000 DP .7290534455 .11419140 6.384 .0000 4.9830886 DDP1 .4744389916 .12707255 3.734 .0003 .58569323E01 DDP2 .4273030624 .11563482 3.695 .0004 .46827528E01 DDP3 .2248432703 .98954483E01 2.272 .0252 .86558444E02 > wald;fn1=b_dp1$ ++++++ Variable  Coefficient  Standard Error b/St.Er.P[Z>z]  ++++++ Fncn(1) .2709465545 .11419140 2.373 .0177
> samp;1204$ > crea;ct=realcons;yt=realgdp;gt=realgovt;rt=tbilrate$ > crea;ct1=ct[1];yt1=yt[1]$ > samp;2204$ > samp;1204$ > crea;ct=realcons;yt=realgdp;gt=realgovt;rt=tbilrate;it=realinvs$ > crea;ct1=ct[1];yt1=yt[1]$ > crea;dy=ytyt1$ > samp;2204$ > name;x=one,rt,ct1,yt1,gt$ > 2sls;lhs=ct;rhs=one,yt,ct1;inst=x;res=ec$ > 2sls;lhs=it;rhs=one,rt,dy;inst=x;res=ei$ > iden;rhs=ec;pds=10$ > iden;rhs=ei;pds=10$ ++  Two stage least squares regression Weighting variable = none   Dep. var. = CT Mean= 3008.995074 , S.D.= 1456.900152   Model size: Observations = 203, Parameters = 3, Deg.Fr.= 200   Residuals: Sum of squares= 96595.67529 , Std.Dev.= 21.97677   Fit: Rsquared= .999771, Adjusted Rsquared = .99977   (Note: Not using OLS. Rsquared is not bounded in [0,1]   Model test: F[ 2, 200] =********, Prob value = .00000   Diagnostic: LogL = 913.8005, Restricted(b=0) LogL = 1766.2087   LogAmemiyaPrCrt.= 6.195, Akaike Info. Crt.= 9.033   Autocorrel: DurbinWatson Statistic = 1.61078, Rho = .19461  ++ +++++++ Variable  Coefficient  Standard Error b/St.Er.P[Z>z]  Mean of X
134
+++++++ Constant 6.666079115 8.6211817 .773 .4394 YT .2932041745E01 .35260653E01 .832 .4057 4577.1882 CT1 1.051478712 .51482187E01 20.424 .0000 2982.9744 ++  Two stage least squares regression Weighting variable = none   Dep. var. = IT Mean= 654.5295567 , S.D.= 391.3705005   Model size: Observations = 203, Parameters = 3, Deg.Fr.= 200   Residuals: Sum of squares= 54658669.31 , Std.Dev.= 522.77466   Fit: Rsquared= .793071, Adjusted Rsquared = .81100   (Note: Not using OLS. Rsquared is not bounded in [0,1]   Diagnostic: LogL = 1557.1409, Restricted(b=0) LogL = 1499.3832   LogAmemiyaPrCrt.= 12.533, Akaike Info. Crt.= 15.371   Autocorrel: DurbinWatson Statistic = 1.49055, Rho = .25473  ++ +++++++ Variable  Coefficient  Standard Error b/St.Er.P[Z>z]  Mean of X +++++++ Constant 141.8297176 103.57113 1.369 .1709 RT 52.04340559 12.971223 4.012 .0001 5.2499007 DY 13.80361384 1.7499250 7.888 .0000 37.898522 Time series identification for EC BoxPierce Statistic = 40.8498 BoxLjung Statistic = 41.7842 Degrees of freedom = 10 Degrees of freedom = 10 Significance level = .0000 Significance level = .0000 * => coefficient > 2/sqrt(N) or > 95% significant. PACF is computed using YuleWalker equations. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Lag  Autocorrelation Function Box/Prc Partial Autocorrelations X xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 1  .194* **  7.65* .194* ** X 2  .264* ***  21.82* .236* *** X 3  .273* ***  36.93* .207* ** X 4  .067  *  37.85*.063  *  X 5  .054  *  38.44*.068  *  X 6  .073  *  39.52* .018  * X 7  .009  *  39.53* .003  * X 8 .078  *  40.78*.109  *  X 9  .019  *  40.85* .023  * X 10  .002  *  40.85* .050  * X xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Time series identification for EI BoxPierce Statistic = 27.4753 BoxLjung Statistic = 28.3566 Degrees of freedom = 10 Degrees of freedom = 10 Significance level = .0022 Significance level = .0016 * => coefficient > 2/sqrt(N) or > 95% significant. PACF is computed using YuleWalker equations. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Lag  Autocorrelation Function Box/Prc Partial Autocorrelations X xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 1  .244* ***  12.13* .244* *** X 2  .143* **  16.27* .096  * X 3  .037  *  16.55*.019  *  X 4 .001  *  16.55*.017  *  X 5 .066  *  17.42*.078  *  X 6  .003  *  17.43* .043  * X 7 .042  *  17.79*.033  *  X 8 .107  *  20.10*.107  *  X 9  .108  *  22.46* .194* ** X 10  .157* **  27.48* .142* ** X xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
135
Chapter 23 Models for Discrete Choice Exercises 1. The loglikelihood is lnL = Σ0,0lnProb[y=0,d=0] + Σ0,1lnProb[y=0,d=1] + Σ1,0lnProb[y=1,d=0] + Σ1,1lnProb[y=1,d=1] where Σi,j indicates the sum over observations for which y = i and d = j. Since there are no other regressors, this reduces to lnL = 24ln(1  F(α)) + 32ln(1  F(δ)) + 28lnF(α) + 16lnF(δ). Although it is straightforward to maximize the loglikelihood directly in terms of α and δ, an alternative, convenient approach is to estimate F(α) and F(δ). These functions can then be inverted to estimate the original parameters. The invariance of maximum likelihood estimators to transformation will justify this approach. One virtue of this approach is that the same procedure is used for both probit and logit models. Let A = F(α) and D = F(δ). Then, the log likelihood is simply lnL = 24ln(1  A) + 32ln(1  D) + 28lnA + 16lnD. The necessary conditions are ∂lnL/∂A = 24/(1  A) + 28/A = 0 ∂lnL/∂D = 32/(1  D) + 16/D = 0. Simple manipulations produce the two solutions A = 28/(24+28) = .539 and D = 16/(32+16) = .333. Then, these functions can be inverted to produce the MLEs of α and β. Thus, αˆ = F1(A) and βˆ = F1(D)  αˆ . The two inverse functions are Φ1(A) for the probit model, which must be approximated, and ln[F/(1F)] for the logit model. The estimates are, Probit Logit α .098 .156 δ .431 .694 β .529 .850 (Notice the proportionality relationship, .156/.098 = 1.592 and .848/.529 = 1.607.) We will compute the asymptotic covariance matrix for αˆ and βˆ directly using (1924) for the probit model and (1922) for the logit model. We will require hi = ∂2lnL/∂(α+βd)2 for the four cells. For the computation, we will require φ(c)/Φ(c) and φ(c)/[1Φ(c)]. These are listed in the table below. λ1 λ0 y d α+βd Φ φ φ/Φ φ/(1Φ) λ0λ1 0 0 .098 .539 .397 .737 .861 .636 1 0 .098 .539 .397 .737 .861 .636 0 1 .431 .333 .364 1.093 .546 .597 1 1 .431 .333 .364 1.093 .546 .597 The estimated asymptotic covariance matrix is the inverse of the estimate of E[H]. ∧ ⎡1 0 ⎤ ⎡1 0 ⎤ ⎡1 1⎤ ⎡1 1⎤ − H = 24(.636) ⎢ + 28(.636) ⎢ + 32(.597) ⎢ + 16(.597) ⎢ ⎥ ⎥ ⎥ ⎥ . Then, ⎣0 0⎦ ⎣0 0 ⎦ ⎣1 1⎦ ⎣1 1⎦
−1
−1
. 28.656⎤ ⎡ 61728 ⎡ .03024 −.03024⎤ ⎡ ∧⎤ ⎢− H⎥ = ⎢28.656 28.656⎥ = ⎢−.03024 .06513 ⎥ . The asymptotic standard errors are the square roots ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ of the diagonal elements, which are .1739 and .2552, respectively. To test the hypothesis that β = 0, we would refer z = .529 / .2552 = 2.073 to the standard normal table. This is larger than the 1.96 critical value, so we would reject the hypothesis. To compute the likelihood ratio statistic, we will require the two loglikelihoods. The restricted loglikelihood (for both the probit and logit models) is given in (1928). This would be lnL0 = 100[.44ln.44 + .56ln.56] = 68.593. Let the predicted values above be denoted P00 = Prob[y=0,d=0] = .461 (i.e., 1  .539) P10 = Prob[y=1,d=0] = .539 P01 = Prob[y=0,d=1] = .667 P11 = Prob[y=0,d=1] = .333
136
and let nij equal the number of observations in each cell Then, the unrestricted loglikelihood is lnL = 24ln.461 + 28ln.539 + 32ln.667 + 16ln.333 = 66.442. The likelihood ratio statistic would be λ = 2(66.6442  (68.593)) = 4.302. The critical value from the chisquared distribution with one degree of freedom is 3.84, so once again, the test statistic is slightly larger than the table value. We now compute the Hessian for the logit model. The predicted probabilities are = .462 Prob[y = 0 , d = 0] = P00 = 1/(1 + e.156) = .538 Prob[y = 1 , d = 0] = P10 = 1  P00 = .667 Prob[y = 0 , d = 1] = P01 = 1/(1 + e.431) = .333. Prob[y = 1 , d = 1] = P11 = 1  P01 Notice that in spite of the quite different coefficients, these are identical to the results for the probit model. Remember that we originally estimated the probabilities, not the parameters, and these were independent of the distribution. Then, the Hessian is computed in the same manner as for the probit model using hij = Fij(1Fij) instead of λ0λ1 in each cell. The asymptotic covariance matrix is the inverse of ⎡1 0⎤ ⎡1 1⎤ (28+24)(.462)(.538) ⎢ +(32+16)(.667)(.333) ⎢ ⎥ ⎥ . The standard errors are .2782 and .4137. For ⎣0 0⎦ ⎣1 1⎦ testing the hypothesis that β equals zero, the tstatistic is z = .850/.4137 = 2.055, which is almost the same as that for the probit model. The unrestricted loglikelihood is lnL = 24ln.4285 + ... + 16ln.3635 = 66.442 (again). The chisquared statistic is 4.302, as before. 2. Using the usual regression statistics, we would have a= y − bx , b = Σ i ( xi − x )( yi − y ) / Σ i ( xi − x ) 2 . For data in which y is a binary variable, we can decompose the numerator somewhat further. First, divide both numerator and denominator by the sample size. Second, since only one variable need be in deviation form, drop the deviation in x. That leaves b = [ Σ i xi ( yi − y ) / n ] / ⎡⎣ Σ i ( xi − x ) 2 / n ⎤⎦ . The denominator is the sample variance of x. Since yi is only 0s and 1s, y is the proportion of 1s in the sample, P. Thus, the numerator is (1/n)Σi xiyi  (1/n)Σi xi y = (1/n)Σ1xi  P x = (n1/n) x1  P[P x + (1P) x0 ] = P(1  P)( x1  x0 ). Therefore, the regression is essentially measuring how much the mean of x varies across the two groups of observations. The constant term does not simplify into any intuitively useful form. 3. The model was estimated using Newton's method as described in the text. The estimated coefficients and their standard are shown below: yˆ * = .51274 + .15964X (1.042) (.202) Loglikelihood = 6.403 Restricted loglikelihood = 6.9315. The tratio for testing the hypothesis is .15964/.202 = .79. The chisquared for the likelihood ratio test is 1.057. Neither is large enough to lead to rejection of the hypothesis. 4. The derivatives of the loglikelihood are given in (2318)(2321). If all coefficients except the constant term are zero, then the first order condition for maximizing the loglikelihood would be ∂lnL/∂β = Σi(yi  λ)(1) = 0 since with no regressors, λi will not vary with i. This leads to the constrained maximum λˆ = Σi yi/n = P, which might be expected. Thus, we estimate the constant term such that P = exp(αˆ ) , or αˆ = logit(P). The LM statistic based on the BHHH estimator of the covariance matrix of the 1 + exp(αˆ ) first derivatives would be LM = [Σigi]′[Σigigi′]1[Σigi] where gi = Σi(yi  P)xi. In full, the statistic is LM = [Σi(yi  P)xi]′[Σi(yi  P)2xixi′]1[Σi(yi  P)xi]. The actual (and expected) Hessian can be used instead by replacing (yi  P)2 with P(1  P) in the inverse matrix. The statistic could then be written LM = [X′(y  Pi)]′[(X′X)1][X′(y  Pi)]/P(1  P) = e′X(X′X)1X′e/P(1  P) In the preceding, e′e = Σi(yi  P)2 = nP(1  P). Therefore, LM = n[e′X(X′X)1X′e/e′e], which establishes the result. To compute the statistic, we regress (yi  P) on the xs, then carry nR2 into the chisquared table. 5. Since there is no regressor, we may write the loglikelihood as lnL = 50lnΦ(α) + 40ln[Φ(μ1α)  Φ(α)] + 45ln[Φ(μ2α)  Φ(μ1α)] +
137
80ln[Φ(μ3α)  Φ(μ2α)] + 35ln[1  Φ(μ3α)]. There are four unknown parameters to estimate and four free probabilities. Suppose, then, we treat Φ(α), Φ(μ1α), Φ(μ2α), and Φ(μ3α) as the unknown parameters, π0, π1, π2, and π3, respectively. If we can find estimators of these, we can solve for the underlying parameters. We may write the loglikelihood as lnL = 50lnπ0 + 40ln(π1  π0) + 45ln(π2  π1) + 80ln(π3  π2) + 35ln(1  π3). The necessary conditions are = 0 ∂lnL/∂π0 = 50/π0  40/(π1π0) = 0 ∂lnL/∂π1 = 40/(π1  π0)  45/(π2  π1) = 0 ∂lnL/∂π2 = 45/(π2  π1)  80/(π3  π2) = 0. ∂lnL/∂π3 = 80/(π3  π2)  35/(1  π3) By a simple rearrangement, these can be recast as a set of linear equations. Thus, = 0 90π0  50π1 = 0 45π0  85π1 + 40π2 = 0 80π1  125π2 + 45π3 = 80  35π2 + 115π3 0 00 ⎤ ⎡ π 0 ⎤ ⎡ 0 ⎤ ⎡90 − 50 ⎢ 45 − 85 40 0 ⎥⎥ ⎢⎢ π 1 ⎥⎥ ⎢⎢ 0 ⎥⎥ ⎢ or = ⎢0 80 − 125 45 ⎥ ⎢ π 2 ⎥ ⎢ 0 ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ 0 − 35 115⎦ ⎣ π 3 ⎦ ⎣80⎦ ⎣0 The solution (as might be expected) is π0 = .2 (50/250) π1 = .36 ((50+40)/250) π2 = .54 ((50+40+45)/250) π3 = .86 ((50+40+45+80)/250). Now, we can solve for the underlying parameters. α = Φ1(.2) = .841, so α = .841. μ1α = Φ1(.36) = .358, so μ1 = .483 μ2α = Φ1(.54) = .101, so μ2 = .942 μ3α = Φ1(.86) = 1.081, so μ3 = 1.922. 6. To estimate the coefficients, we will use a two step FGLS procedure. Ordinary least squares estimates based on Section 19.4.3 are consistent, but inefficient. The OLS regression produces Φ1(Pi) = zˆi = 2.18098 + .0098898T (.7404) (.002883). The predicted values from this regression can then be used to compute the weights in (2139). The weighted least squares regression produces zˆi = 2.3116 + .010646T (.8103) (.003322) In order to achieve a predicted proportion of 95%, we will require zi = 1.645. The T required to achieve this is T = (1.645 + 2.3116) / .010646 = 372. The zi which corresponds to 90% is 1.282. Doing the same calculation as above, this requires T = 338 trucks. At $20,000 per truck, this requires $6.751 million, so the budget is inadequate. The marginal effect is ∂Φi/∂T = .010646φ(zi). At T = 300, zi = .8822, so φ(zi) = .2703 and the marginal effect is .00288.
138
7. This is similar to Exercise 1. It is simplest to prove it in that framework. Since the model has only a dummy variable, we can use the same log likelihood as in Exercise 1. But, in this exercise, there are no observations in the cell (y=1,x=0). The resulting log likelihood is, therefore, lnL = Σ0,0lnProb[y=0,x=0] + Σ0,1lnProb[y=0,x=1] + Σ1,1lnProb[y=1,x=1] or lnL = n3lnProb[y=0,x=0] + n2lnProb[y=0,x=1] + n1lnProb[y=1,x=1]. Now, let δ = α + β. The log likelihood function is lnL = n3ln(1  F(α)) + n2ln(1  F(δ)) + n1lnF(δ). For estimation, let A = F(α) and D = F(δ). We can estimate A and D, then α = F1(A) and β = F1(D)  α. The first order condition for estimation of A is ∂lnL/∂A = n3/(1  A) = 0, which obviously has no solution. If A cannot be estimated then α cannot either, nor, in turn, can β. This applies to both probit and logit models. 8. We’ll do this more generally for any model F(α). Since the ‘model’ contains only a constant, the log likelihood is logL = Σ0log[1F(α)] + Σ1logF(α) = n0log[1F(α)]+n1logF(α) . The likelihood equation is ∂logL/∂α = Σ0[f(α)/[1F(α)] + Σ1f(α)/F(α) = 0 where f(α) is the density (derivative of F(α) so that at the solution, n0f(α)/[1F(α)] = n1f(α)/F(α). Divide both sides of this equation by f(α) and solve it for F(α) = n1/(n0+n1), as might be expected. You can then insert this solution for F(α) back into the log likelihood, and (2328) follows immediately. 9. Look at the two cases. Neither case has an estimator which is consistent in both cases. In both cases, the unconditional fixed effects effects estimator is inconsistent, so the rest of the analysis falls apart. This is the incidental parameters problem at work. Note that the fixed effects estimator is inconsistent because in both models, the estimator of the constant terms is a function of 1/T. Certainly in both cases, if the fixed effects model is appropriate, then the random effects estimator is inconsistent, whereas if the random effects model is appropriate, the maximum likelihood random effects estimator is both consistent and efficient. Thus, in this instance, the random effects satisfies the requirements of the test. In fact, there does exist a consistent estimator for the logit model with fixed effects  see the text. However, this estimator must be based on a restricted sample observations with the sum of the ys equal to zero or T muust be discarded, so the mechanics of the Hausman test are problematic. This does not fall into the template of computations for the Hausman test.
Applications 1. Binary Choice for Extramarital Affairs using Redbook data
?======================================================== ? Application 23.1 ?======================================================== ? Create ; A = (Yrb > 0) $ Namelist ; X = one,v1,v2,v5,v6 $ Probit ; Lhs = A ; Rhs = X ; marginal Effects $ Logit ; Lhs = A ; Rhs = X ; marginal Effects $ ++  Binomial Probit Model   Maximum Likelihood Estimates   Dependent variable A   Number of observations 6366   Log likelihood function 3547.865   Number of parameters 5   Info. Criterion: AIC = 1.11620   Info. Criterion: BIC = 1.12151   Restricted log likelihood 4002.530  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ +Index function for probability
139
Constant 1.43453507 .15493583 9.259 .0000 V1  .42595261 .01807583 23.565 .0000 4.10964499 V2  .02797013 .00254409 10.994 .0000 29.0828621 V5  .20942202 .02015534 10.390 .0000 2.42617028 V6  .03522668 .00801808 4.393 .0000 14.2098649 ++  Partial derivatives of E[y] = F[*] with   respect to the vector of characteristics.   They are computed at the means of the Xs.   Observations used for means are All Obs.  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z]Elasticity +++++++ + Constant .27876593 .01081795 25.769 .0000 V1  .14911732 .00634679 23.495 .0000 2.01181601 V2  .00979177 .00088860 11.019 .0000 .93487672 V5  .07331438 .00703451 10.422 .0000 .58393740 V6  .01233214 .00280535 4.396 .0000 .57528664 ++  Binary Logit Model for Binary Choice   Maximum Likelihood Estimates   Dependent variable A   Number of observations 6366   Log likelihood function 3549.741   Number of parameters 5   Info. Criterion: AIC = 1.11679   Info. Criterion: BIC = 1.12210   Restricted log likelihood 4002.530  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ +Characteristics in numerator of Prob[Y = 1] Constant 2.41622262 .26160831 9.236 .0000 V1  .70802698 .03091557 22.902 .0000 4.10964499 V2  .04624150 .00426119 10.852 .0000 29.0828621 V5  .35139771 .03413337 10.295 .0000 2.42617028 V6  .05899324 .01354756 4.355 .0000 14.2098649 ++  Partial derivatives of probabilities with   respect to the vector of characteristics.   They are computed at the means of the Xs.   Observations used are All Obs.  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z]Elasticity +++++++ +Marginal effect for variable in probability Constant .50898166 .05554126 9.164 .0000 V1  .14914716 .00650799 22.918 .0000 2.03205673 V2  .00974086 .00089378 10.898 .0000 .93918419 V5  .07402256 .00714156 10.365 .0000 .59539053 V6  .01242703 .00285019 4.360 .0000 .58542862
2. Ordered Choice For Self Reported Marriage Rating ++  Ordered Probability Model   Maximum Likelihood Estimates   Dependent variable MARRIAGE   Weighting variable None 
140
 Number of observations 6366   Iterations completed 15   Log likelihood function 7720.145   Number of parameters 12   Info. Criterion: AIC = 2.42920   Info. Criterion: BIC = 2.44194   Restricted log likelihood 7926.487   Underlying probabilities based on Normal  ++ ++  Ordered Probability Model   Cell frequencies for outcomes   Y Count Freq Y Count Freq Y Count Freq   0 99 .015 1 348 .054 2 993 .155   3 2242 .352 4 2684 .421  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ +Index function for probability Constant 1.87997564 .12760529 14.733 .0000 YRB  .09669427 .00649907 14.878 .0000 .70537389 V2  .00624520 .00471646 1.324 .1855 29.0828621 V3  .00952932 .00506534 1.881 .0599 9.00942507 V4  .05879586 .01520251 3.868 .0001 1.39687402 V5  .10524384 .01624338 6.479 .0000 2.42617028 V6  .02526318 .00727002 3.475 .0005 14.2098649 V7  .02069865 .01614318 1.282 .1998 3.42412818 V8  .02725715 .01072244 2.542 .0110 3.85014138 +Threshold parameters for index Mu(1)  .71088354 .02219910 32.023 .0000 Mu(2)  1.47186849 .01737814 84.697 .0000 Mu(3)  2.46392113 .01923976 128.064 .0000 ++  Summary of Marginal Effects for Ordered Probability Model (probit)  ++ Variable Y=00 Y=01 Y=02 Y=03 Y=04 Y=05 Y=06 Y=07  + YRB .0031 .0087 .0167 .0093 .0377 V2 .0002 .0006 .0011 .0006 .0024 V3 .0003 .0009 .0016 .0009 .0037 V4 .0019 .0053 .0101 .0056 .0229 V5 .0033 .0095 .0182 .0101 .0411 V6 .0008 .0023 .0044 .0024 .0099 V7 .0007 .0019 .0036 .0020 .0081 V8 .0009 .0025 .0047 .0026 .0106 ++  Cross tabulation of predictions. Row is actual, column is predicted.   Model = Probit . Prediction is number of the most probable cell.  +++++++++++++  ActualRow Sum 0  1  2  3  4  5  6  7  8  9  +++++++++++++  0 99 0 0 0 68 31  1 348 2 0 5 170 171  2 993 7 0 7 453 526  3 2242 3 0 10 674 1555  4 2684 2 0 5 593 2084 +++++++++++++ Col Sum 6366 14 0 27 1958 4367 0 0 0 0 0 +++++++++++++
141
Chapter 24 Truncation, Censoring and Sample Selection Exercises 1. The sample mean of all 20 observations is 4.18222. For the 14 nonzero observations, the mean is (20/14)4.18222 = 5.9746. Both of these should overestimate μ. In the first case, all negative values have been transformed to zeroes. Therefore, if we had had the original data, our estimator would include the negative values as well as the positive ones. Since we have only the zeroes, instead, our estimator includes, for every negative y* a number which is larger than the true y*. This will inflate the estimate. Likewise, for the truncated mean, whereas a complete sample might include some negative values, the observed one will not. Once again, this will serve to inflate the estimator of the mean. 2. The loglikelihood for the Tobit model is given in (2413). With only a constant term, this is lnL = (n1/2)[ln(2π) + lnσ2]  (1/(2σ2))Σ1(yi  μ)2 + Σ0lnΦ(μ/σ) In terms of γ and θ, this is lnL = (n1/2)[ln(2π)  lnθ2]  (1/2)Σ1(θyi  γ)2 + Σ0lnΦ(γ) = (n1/2)ln(2π) + n1lnθ  (1/2)Σ1(θyi  γ)2 + Σ0lnΦ(γ). The necessary conditions for maximizing this with respect to γ and θ are ∂lnL/∂γ = Σ1(θyi  γ)  Σ0φ(γ)/Φ(γ) = θΣ1yi  n1γ  n0[φ(γ)/Φ(γ)] = 0 ∂lnL/∂θ = n1/θ  Σ1yi(θyi  γ) = n1/θ  θΣ1yi2 + γΣ1yi = 0. There are a few different ways one might solve these two equations. A grid search over the values of γ and θ is a possibility. A direct maximum likelihood estimator for the tobit model is the simpler choice if one is available. The model with only a constant term is otherwise the same as the usual model. Using the data above, the tobit maximum likelihood estimates are μˆ = 3.2731, σˆ = 5.0303. 3. The loglikelihood for the truncated regression with only a constant term is lnL = (n/2)[ln(2π) + lnσ2]  (1/(2σ2))Σ1(yi  μ)2  ΣilnΦ(μ/σ) Once again transforming to γ and σ, this is lnL = (n/2)ln(2π) + nlnθ  (1/2)Σi(θyi  γ)2  nlnΦ(γ). The necessary conditions for maximizing this are ∂lnL/∂γ = Σi(θyi  γ)  nφ(γ)/Φ(γ) = 0 ∂lnL/∂θ = n/θ  Σiyi(θyi  γ) The first of the two equations can be y = γ/θ + λ/θ, where λ = φ(γ)/Φ(γ). Now, reverting back to μ and σ, this is y = μ + σλ which is (246). The second equation can be manipulated to produce Σyi2/n  μ y = σ2. Once again, trial and error could be used to find a solution. As before, estimating the model as a truncated regression with only a constant term will also produce a solution. The solution by this method is μˆ = 3.3439, σˆ = 5.6368. With the data of the first problem, we would have the following: Estimated Prob[y* > 0] = 14/20 = .7. This is an estimate of Φ(μ/σ), so we would have μ/σ = Φ1(.7) = .525 or μ = .525σ. Now, we can use the relationship E[yy > 0] = μ + σφ(μ/σ)/Φ(μ/σ) = μ + σλ. Since μ/σ is now known, we have λ = φ(.525) / Φ(.525) = .496 so a second equation is 5.9746 = μ + .496σ. The joint solution is μˆ = 3.0697, σˆ = 5.8470. The three solutions are surprisingly close.
142
4. Using Theorem 24.5, we have 1  Φ(αz) = 14/35 = .4, αz = Φ1(.6) = .253, λ(αz) = .9659, δ(αz) = .6886. The two moment equations are based on the mean and variance of y in the observed data, 5.9746 and 9.869, respectively. The equations would be 5.9746 = μ + σ(.7)(.9659) and 9.869 = σ2(1 .72(.6886)). The joint solution is μˆ = 3.3651, σˆ = 3.8594. 5. The conditional mean function is E[yx] = Φ(β′xi/σi)β′xi + σiΦ(β′xi/σi) using the equation before (2412). Suppose that σi = σexp(α′xi) for the same vector xi. (We’ll relax that assumption shortly.) Now, differentiate this expression with respect to x. We differentiate the two parts, first with respect to β′x then with respect to σi. ⎛ β ' xi ⎞ ⎛ β ' xi ⎞ 1 ∂E[ yi xi ] = Φ ⎜⎜ β +σi ⎟⎟ β + β ' x i φ ⎜⎜ ⎟⎟ ∂xi ⎝ σi ⎠ ⎝ σi ⎠ σi
(
)
⎡ ⎛ β ' x ⎞ ⎛ β ' x ⎞⎤ 1 i φ i ⎥ β ⎢ −⎜ ⎟ ⎜ ⎟ ⎢⎣ ⎝ σ i ⎠ ⎝ σ i ⎠ ⎥⎦ σ i
⎡ ⎛ β ' x ⎞ ⎛ β ' x ⎞ ⎤ ⎛ −1 ⎞ ⎛ β ' x ⎞ ⎛ β ' xi ⎞ ⎛ β ' x i ⎞ ⎛ −1 ⎞ ⎛ β ' x i ⎞ σ i α + σ i ⎢ −⎜ i ⎟φ ⎜ i ⎟ ⎥ ⎜⎜ ⎟⎟ ⎜⎜ i ⎟⎟ σ i α + β ' x i φ ⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟ σ i α + φ ⎜ ⎟ ⎢⎣ ⎝ σ i ⎠ ⎝ σ i ⎠ ⎦⎥ ⎝ σ i ⎠ ⎝ σ i ⎠ ⎝ σi ⎠ ⎝ σi ⎠ ⎝ σi ⎠ ⎝ σi ⎠
(
)
After collecting the terms, we obtain ∂E[yixi]/∂xi = Φ(ai)β + σiφ(ai)α where ai = β′xi/σi. Thus, the marginal effect has two parts. one for β and one for α. Now, if a variable appears in σi but not in xi, then only the second term appears while if a variable appears only in xi and not in σi, then only the first term appears in the marginal effect. 6. The transformed log likelihood function is logL = Σy > 0 (1/2)[log2π  logθ2 + (θy  x′γ)2] + Σy=0 log[1Φ(x′γ)] It will be convenient to define ai = xi′γ. Note also that 1  Φ(ai) = Φ(ai). The first derivatives and Hessian in the transformed parameters are ∂ log L = ∑ y >0 (1/ θ ) − yi (θ yi − ai ) i ∂θ ∂ log L ∂γ
=
∑ yi >0
2
∂ log L ∂θ
2
∂γ∂γ ' ∂γ∂θ
2
[φ (−ai ) / Φ (−ai )] (−xi )
2
∑ yi >0
− 1/ θ − yi
=
∑ yi >0
− x x ' + ∑ y =0 i i i
=
∑ yi >0 − xi yi
2
∂ log L
i
=
2
∂ log L
xi (θ yi −ai ) + ∑ y =0
− [φ ( − ai ) / Φ ( − ai )]{− ai + [φ ( − ai ) / Φ ( − ai )]}xi xi '
The second derivatives can be collected in a matrix format:
⎡ ⎛ x ⎞⎛ x ⎞ ⎛ 0 ⎞⎛ 0 ⎞ ⎤ ⎛x ⎞⎛x ⎞ = ∑ y >0 ⎢− ⎜ i ⎟⎜ i ⎟ '− ⎜ ⎟⎜ ⎟ '⎥ + ∑ y =0 δ i ⎜ i ⎟ ⎜ i ⎟ ' ⎛γ⎞ ⎛γ⎞ ⎝ 0 ⎠⎝ 0 ⎠ ⎣ ⎝ − yi ⎠⎝ − yi ⎠ ⎝ θ ⎠⎝ θ ⎠ ⎦ ∂⎜ ⎟∂⎜ ⎟' ⎝θ ⎠ ⎝θ ⎠ ∂ log L
where δi is the last scalar term in ∂2logL/∂δ∂γ′. By Theorem 22.2 (see (244)), we know that δi is negative. Thus, all three parts of the matrix are negative semidefinite. Assuming the data are not linearly dependent and there are more than K observations, the Hessian will have full rank and be negative definite.
143
Applications 1. Tobit model for Redbook data ?============================================================ ? Applications in Chapter 24 ?============================================================ ? 1. Tobit, Scaled Tobit, Probit and Truncated Regression. ? In principle, all are estimating the same paramter. ? For consistency and convenience, we are going to use the ? sample with YRB <= 5 only. ?============================================================ Sample ; All $ Reject ; YRB > 5 $ Namelist ; X = one,v1,v2,v3,v4,v5$ Tobit ; Lhs = yrb ; Rhs = x ; marginal $ Matrix ; list ; scaled_b = 1/s * b $ Probit ; Lhs = a ; Rhs = x $ reject ; yrb <= 0 $ Truncation ; Lhs = yrb ; Rhs = x $ ++  Limited Dependent Variable Model  CENSORED   Maximum Likelihood Estimates   Dependent variable YRB   Weighting variable None   Number of observations 6217   Iterations completed 6   Log likelihood function 6118.089   Number of parameters 7   Info. Criterion: AIC = 1.97043   Finite Sample: AIC = 1.97044   Info. Criterion: BIC = 1.97802   Info. Criterion:HQIC = 1.97306   Threshold values for the model:   Lower= .0000 Upper=+infinity   LM test [df] for tobit= 622.887[ 6]   Normality Test, LM = 150.850[ 2]   ANOVA based fit measure = .293201   DECOMP based fit measure = .438743  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ +Primary Index Equation for Model Constant 4.13828429 .31908252 12.969 .0000 V1  .80415431 .03782416 21.260 .0000 4.12272800 V2  .06923599 .01229186 5.633 .0000 29.1829661 V3  .10402446 .01325380 7.849 .0000 9.12329098 V4  .02190617 .03898707 .562 .5742 1.41499115 V5  .43110692 .04356398 9.896 .0000 2.43670581 +Disturbance standard deviation Sigma  2.27697641 .04212836 54.049 .0000 ++  Partial derivatives of expected val. with   respect to the vector of characteristics.   They are computed at the means of the Xs.   Observations used for means are All Obs.   Conditional Mean at Sample Point .3941   Scale Factor for Marginal Effects .2796  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X
144
+++++++ Constant 1.15697490 .09110678 12.699 .0000 V1  .22482418 .01048093 21.451 .0000 4.12272800 V2  .01935689 .00342807 5.647 .0000 29.1829661 V3  .02908299 .00367661 7.910 .0000 9.12329098 V4  .00612449 .01090115 .562 .5742 1.41499115 V5  .12052818 .01207702 9.980 .0000 2.43670581 Sigma  .000000 ......(Fixed Parameter)....... Matrix SCALED_B has 6 rows and 1 columns. 1 +1 1.81745 2 .35317 3 .03041 4 .04569 5 .00962 6 .18933 ++  Binomial Probit Model   Maximum Likelihood Estimates   Dependent variable A   Weighting variable None   Number of observations 6217   Iterations completed 5   Log likelihood function 3310.310   Number of parameters 6   Info. Criterion: AIC = 1.06685   Info. Criterion: BIC = 1.07335   Restricted log likelihood 3830.126  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ +Index function for probability Constant 2.03641060 .15678428 12.989 .0000 V1  .41449474 .01860450 22.279 .0000 4.12272800 V2  .03568737 .00593540 6.013 .0000 29.1829661 V3  .07215336 .00640693 11.262 .0000 9.12329098 V4  .00241124 .01891503 .127 .8986 1.41499115 V5  .21212886 .02089864 10.150 .0000 2.43670581 ++  Limited Dependent Variable Model  TRUNCATE   Maximum Likelihood Estimates   Dependent variable YRB   Weighting variable None   Number of observations 1904   Iterations completed 8   Log likelihood function 2437.473   Number of parameters 7   Info. Criterion: AIC = 2.56772   Info. Criterion: BIC = 2.58813   Threshold values for the model:   Lower= .0000 Upper=+infinity   Observations after truncation 1904  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ +Primary Index Equation for Model Constant 5.22651388 .94010948 5.559 .0000 V1  .45753380 .10715203 4.270 .0000 3.65388655 V2  .04779763 .03766086 1.269 .2044 30.9776786 V3  .25376184 .04622853 5.489 .0000 11.6919643
145
V4  .37961397 V5  .22780476 +Disturbance standard Sigma  2.38479704
.12878071 .13328147 deviation .13327563
2.948 1.709
.0032 .0874
17.894
.0000
1.81407563 2.28308824
2. Two part Model. The three estimated models appear above. The test statistic is ++  Listed Calculator Results  ++ TEST2 = 740.610758
This is much larger than the chi squared critical value for 5 degrees of freedom. We conclude that the participation equation (probit) is different from the intensity equation (yrb).
146
Chapter 25 Models for Event Counts and Duration Exercises 1. a. Conditional variance in the ZIP model. The essential ingredients that are needed for this derivation are λi E[ y*  y* > 0, xi ] = = Ei* 1 − exp(−λ i ) and ⎛ ⎞⎛ ⎞ ⎛ ⎞ λi λi λi Var[ y*  y* > 0, xi ] = ⎜ ⎟⎜1 − ⎟ = Ei * ⎜1 − ⎟ = Ei *Vi * ⎝ 1 − exp(−λi ) ⎠⎝ exp(λi ) − 1 ⎠ ⎝ exp(λi ) − 1 ⎠ [See, e.g., Winkelmann (2003, pp. 3334).]. We found the conditional mean in the text to be E[yixi,wi] =
Fi λ i = Fi Ei* 1 − exp( −λ i )
To obtain the variance, we will use the variance decomposition, Var[yixi,wi] = Ez[Var[yixi,z]] + Varz[E[yixi,z]]. The expectation of the conditional variance is
⎛ ⎞⎛ ⎞ λi λi Ez[Var[yixi,z]] = (1 – Fi)×0 + Fi× ⎜ ⎟⎜1 − ⎟ = Fi × Ei* × Vi* − −λ λ − 1 exp( ) exp( ) 1 i ⎠⎝ i ⎝ ⎠ The variance of the conditional mean is 2
2
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ λi Fi λ i λi Fi λ i − (1 – Fi) × ⎜ 0 − ⎟ ⎟ + Fi ⎜ ⎟ = Fi(1Fi) ⎜ ⎝ 1 − exp(−λi ) 1 − exp(−λ i ) ⎠ ⎝ 1 − exp(−λi ) ⎠ ⎝ 1 − exp(−λi ) ⎠
2
= Fi(1 – Fi)Ei*2. The unconditional variance is thus, Fi Ei* [Vi* + (1 – Fi)Ei*]. To obtain τi we divide by the conditional mean, which is Fi Ei*, so τi = [Vi* + (1 – Fi)Ei*]. Is this greater than Ei*? Not necessarily. The figure below plots Fi(1 – Fi)Ei*2 for Fi = .9 and various values of λ from .1 to about 12. There is a large range over which the function is less than one.
147
2 .2 5 0 2 .0 0 0 1 .7 5 0 1 .5 0 0
F
1 .2 5 0 1 .0 0 0 .7 5 0 .5 0 0 .2 5 0 .0 0 0 2
0
2
4
6
8
10
12
AL
b. Partial Effects. The mean is Fi Ei*. We suppose that wi and xi are the same for the moment. ∂Ei/∂xi = Ei*∂Fi/∂xi + Fi ∂Ei*/∂xi. The first term is Ei*×fi×γ. The second term is Fi ∂Ei*/∂λi λiβ. The missing element is ∂Ei*/∂λi = λi/[1exp(λi)] × [1 – exp(λi)/[1exp(λi)]. Comnbining terms produces the marginal effects. 2. Let y* denote the unobserved random variable that is distributed as Poisson with probability Prob(y* = jx) = P(j) = exp(λ)λj/j!. The observed random variable before the censoring is is y = y*y*>0. The probabilities are Prob(y = jx) = P(j)/[1 – P(0)]. Let yc = the censored random variable. Then, yc = y for y = 1,2,3,4. yc = 5 when y > 5. The probabilities associated with the observed yc are Prob(yc = 1x) = Prob(y = 1x) = P(1)/[1P(0)] Prob(yc = 2x) = Prob(y = 2x) = P(2)/[1P(0)] Prob(yc = 3x) = Prob(y = 3x) = P(3)/[1P(0)] Prob(yc = 4x) = Prob(y = 4x) = P(4)/[1P(0)] Prob(yc = 5x) = Prob(y = 5x) + Prob(y = 6x) + Prob(y = 7x) + ... The last term is an infinite sum. But, Prob(y = 5x) + Prob(y = 6x) + Prob(y = 7x) + ... = 1  Prob(y = 1x)  Prob(y = 2x)  Prob(y = 3x)  Prob(y = 4x) Therefore, Prob(yc = 5x) = [1 – P(1) – P(2) – P(3) – P(4)]/[1 – P(0)]. These are the probabilities used to construct the log likelihood function for the observed values of yc, 1,2,3,4,5. 3. The hazard function is easily obtained as h(t) = dlnS(t)/dt. For the Weibull model, lnS(t) = (λt)P to the hazard function is (λp)(λt)P1. The median survival time occurs where the survival function equals .5. Thus, exp((λt)P) = .5 (λt)P = ln .5 (λt)P = ln.5 = ln2 P*ln(λ) + P lnt = ln ln 2 P ln t = ln ln 2 – P lnλ ln t = (1/P)[ln ln 2 – P lnλ] t = exp[(1/P)[ln ln 2 – P ln λ].
148
Applications 1. ?=================================================== ? Application 25.1 ?=================================================== Namelist ;x = age,educ,hhninc,hsat $ Poisson ; Lhs = HospVis ; Rhs = One,X ; Marginal effects $ Calc ; Lp = logl $ Regress ; Lhs = HospVis ; Rhs = One,X $ Negbin ; Lhs = HospVis ; Rhs = One,X ; Marginal effects $ Calc ; Ln = logl $ Calc ; List ; LRstat = 2*(ln  lp) $ ?=================================================== ? Application 25.2 ?=================================================== Sample ; All $ Regress ; Lhs = one ; Rhs = one ; Str = ID ; Panel $ Poisson ; Lhs = HospVis ; Rhs = One,X ; Marginal effects ; Pds = _Groupti $ Poisson ; Lhs = HospVis ; Rhs = One,X ; Marginal effects ; Pds = _Groupti ; Random $ ++  Poisson Regression   Maximum Likelihood Estimates   Dependent variable HOSPVIS   Weighting variable None   Number of observations 27326   Iterations completed 9   Log likelihood function 12636.40   Number of parameters 5   Info. Criterion: AIC = .92523   Info. Criterion: BIC = .92673   Restricted log likelihood 13433.21  ++ ++  Poisson Regression   Chi squared =124476.35621 RsqP= .1947   G  squared = 20025.66932 RsqD= .0737   Overdispersion tests: g=mu(i) : 5.279   Overdispersion tests: g=mu(i)^2: 5.468  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant .12613692 .12567036 1.004 .3155 AGE  .00340754 .00149685 2.276 .0228 43.5256898 EDUC  .05295428 .00834958 6.342 .0000 11.3206310 HHNINC  .39889043 .08982355 4.441 .0000 .35208362 HSAT  .24901310 .00634000 39.277 .0000 6.78542607 ++  Partial derivatives of expected val. with   respect to the vector of characteristics.   Effects are averaged over individuals.   Observations used for means are All Obs. 
149
 Conditional Mean at Sample Point .1383   Scale Factor for Marginal Effects .1383  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant .01743926 .02183573 .799 .4245 AGE  .00047111 .00025979 1.813 .0698 43.5256898 EDUC  .00732128 .00149415 4.900 .0000 11.3206310 HHNINC  .05514924 .01579375 3.492 .0005 .35208362 HSAT  .03442771 .00220148 15.638 .0000 6.78542607 ++  Ordinary least squares regression   LHS=HOSPVIS Mean = .1382566   Standard deviation = .8843390   WTS=none Number of observs. = 27326   Model size Parameters = 5   Degrees of freedom = 27321   Residuals Sum of squares = 21121.96   Standard error of e = .8792630   Fit Rsquared = .1159150E01   Adjusted Rsquared = .1144679E01   Model test F[ 4, 27321] (prob) = 80.10 (.0000)  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant .49839670 .04097910 12.162 .0000 AGE  .00064393 .00048945 1.316 .1883 43.5256898 EDUC  .00619390 .00241633 2.563 .0104 11.3206310 HHNINC  .04936160 .03122845 1.581 .1140 .35208362 HSAT  .04117251 .00240443 17.124 .0000 6.78542607 ++  Negative Binomial Regression   Dependent variable HOSPVIS   Number of observations 27326   Iterations completed 9   Log likelihood function 10044.46   Number of parameters 6   Info. Criterion: AIC = .73560   Info. Criterion: BIC = .73740   Restricted log likelihood 12636.40  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant .10394982 .12631220 .823 .4105 AGE  .00369348 .00143149 2.580 .0099 43.5256898 EDUC  .05795593 .00826247 7.014 .0000 11.3206310 HHNINC  .38542430 .09259876 4.162 .0000 .35208362 HSAT  .23323713 .00651715 35.788 .0000 6.78542607 +Dispersion parameter for count data model Alpha  6.70461029 .17537071 38.231 .0000 ++  Partial derivatives of expected val. with   respect to the vector of characteristics.   Effects are averaged over individuals.   Observations used for means are All Obs.   Conditional Mean at Sample Point .1367   Scale Factor for Marginal Effects .1367  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X
150
+++++++ Constant .01421398 .02120646 .670 .5027 AGE  .00050504 .00024071 2.098 .0359 43.5256898 EDUC  .00792483 .00146645 5.404 .0000 11.3206310 HHNINC  .05270247 .01588312 3.318 .0009 .35208362 HSAT  .03189257 .00226820 14.061 .0000 6.78542607 ++  Listed Calculator Results  ++ LRSTAT = 5183.862874
2. ++  Panel Model with Group Effects   Dependent variable HOSPVIS   Weighting variable None   Number of observations 27326   Log likelihood function 4198.145   Number of parameters 4   Info. Criterion: AIC = .30756   Info. Criterion: BIC = .30876   Unbalanced panel has 7293 individuals.   Missing or sumY=0, Skipped 5640 groups.   Poisson Regression  Fixed Effects  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ AGE  .00020613 .00705126 .029 .9767 43.5256898 EDUC  .04033708 .09220144 .437 .6618 11.3206310 HHNINC  .49927712 .18484588 2.701 .0069 .35208362 HSAT  .16686419 .01027579 16.239 .0000 6.78542607 ++  Partial derivatives of expected val. with   respect to the vector of characteristics.   They are computed at the means of the Xs.   Observations used for means are All Obs.   Conditional Mean at Sample Point .1383   Scale Factor for Marginal Effects .1383  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ AGE  .284995D04 .00097488 .029 .9767 1.00000000 EDUC  .00557687 .01274746 .437 .6618 43.5256898 HHNINC  .06902836 .02555616 2.701 .0069 11.3206310 HSAT  .02307008 .00142070 16.239 .0000 .35208362 ++  Panel Model with Group Effects   Dependent variable HOSPVIS   Number of observations 27326   Log likelihood function 10200.91   Number of parameters 6   Info. Criterion: AIC = .74705   Info. Criterion: BIC = .74885   Unbalanced panel has 7293 individuals.   Poisson Regression  Random Effects  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant .22178663 .13617622 1.629 .1034 AGE  .00170639 .00145901 1.170 .2422 43.5256898
151
EDUC  .05399730 .01001912 5.389 .0000 11.3206310 HHNINC  .40499179 .06938275 5.837 .0000 .35208362 HSAT  .20075292 .00400154 50.169 .0000 6.78542607 Alpha  3.59227655 .11685254 30.742 .0000 ++  Partial derivatives of expected val. with   respect to the vector of characteristics.   They are computed at the means of the Xs.   Observations used for means are All Obs.   Conditional Mean at Sample Point .1383   Scale Factor for Marginal Effects .1383  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant .03066347 .01882726 1.629 .1034 AGE  .00023592 .00020172 1.170 .2422 43.5256898 EDUC  .00746548 .00138521 5.389 .0000 11.3206310 HHNINC  .05599279 .00959262 5.837 .0000 .35208362 HSAT  .02775542 .00055324 50.169 .0000 6.78542607
3. Ship Accidents Create ; logmth = log(months) $ Name ; X=logmth,one,ta,tb,tc,td,t6064,t6569,t7074,o6074$ Reject ; acc < 0 $ Pois ; lhs = acc ; Rhs = x $ Pois ; lhs = acc ; Rhs = x ; Rst = 1,9_b $ Negb ; lhs = acc ; Rhs = x ; Rst = 1,9_b,alpha $ ++  Poisson Regression   Dependent variable ACC   Number of observations 34   Log likelihood function 67.99930   Number of parameters 10   Info. Criterion: AIC = 4.58819   Info. Criterion: BIC = 5.03712   Restricted log likelihood 356.2029  ++ ++  Poisson Regression   Chi squared = 39.70580 RsqP= .9491   G  squared = 38.13211 RsqD= .9380   Overdispersion tests: g=mu(i) : .853   Overdispersion tests: g=mu(i)^2: .760  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ LOGMTH  .90617018 .10174566 8.906 .0000 7.04925451 Constant 4.61752968 .72938865 6.331 .0000 TA  .26966656 .24189066 1.115 .2649 .20588235 TB  .62826604 .32582681 1.928 .0538 .20588235 TC  1.03179604 .34039236 3.031 .0024 .20588235 TD  .40106977 .30540945 1.313 .1891 .20588235 T6064  .36146212 .24726698 1.462 .1438 .23529412 T6569  .30035782 .21325393 1.408 .1590 .29411765 T7074  .39874282 .20053445 1.988 .0468 .29411765 O6074  .36986273 .11821010 3.129 .0018 .41176471 ++  Poisson Regression   Maximum Likelihood Estimates   Dependent variable ACC 
152
 Number of observations 34   Log likelihood function 68.41456   Number of parameters 9   Info. Criterion: AIC = 4.55380   Info. Criterion: BIC = 4.95783   Restricted log likelihood 356.2029  ++ ++  Poisson Regression   Chi squared = 42.44145 RsqP= .9456   G  squared = 38.96262 RsqD= .9366   Overdispersion tests: g=mu(i) : .934   Overdispersion tests: g=mu(i)^2: .613  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ LOGMTH  1.00000000 ......(Fixed Parameter)....... Constant 5.25351861 .24642858 21.319 .0000 TA  .32052881 .23575203 1.360 .1740 .20588235 TB  .86524026 .19852119 4.358 .0000 .20588235 TC  1.00929327 .33950071 2.973 .0030 .20588235 TD  .39483795 .30680184 1.287 .1981 .20588235 T6064  .44497064 .23323916 1.908 .0564 .23529412 T6569  .25087485 .20875483 1.202 .2295 .29411765 T7074  .37248476 .19930193 1.869 .0616 .29411765 O6074  .38385913 .11826046 3.246 .0012 .41176471 There is no evidence of overdispersion. The tests from the Poisson model are both insignificant, and the estimate of α in the negative binomial model is essentially zero. ++  Negative Binomial Regression   Dependent variable ACC   Weighting variable None   Number of observations 34   Log likelihood function 68.42007   Number of parameters 10   Info. Criterion: AIC = 4.61295   Finite Sample: AIC = 4.89428   Info. Criterion: BIC = 5.06188   Info. Criterion:HQIC = 4.76604   NegBin form 2; Psi(i) = theta  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ LOGMTH  1.00000000 ......(Fixed Parameter)....... Constant 5.25074235 .26830333 19.570 .0000 TA  .32296435 .39695609 .814 .4159 .20588235 TB  .86731524 .20092395 4.317 .0000 .20588235 TC  1.01171406 .24980570 4.050 .0001 .20588235 TD  .39875463 .23889734 1.669 .0951 .20588235 T6064  .44585250 .31679943 1.407 .1593 .23529412 T6569  .25060358 .27552926 .910 .3631 .29411765 T7074  .37073607 .25504806 1.454 .1461 .29411765 O6074  .38364155 .15800844 2.428 .0152 .41176471 +Dispersion parameter for count data model Alpha  .648724D04 .02406424 .003 .9978
153
4. Strikes. There are 9 years of data. The number of strikes is 8,6,11,3,3,2,19,2,9. The Poisson regression is shown below. It does appear that the number of strikes is significantly related to the PROD variable. However, with only 9 observations, use of the asymptotic distribution for the test is probably overly optimistic. The result is probably borderline. ++  Poisson Regression   Dependent variable _GROUPTI   Weighting variable None   Number of observations 9   Log likelihood function 28.99317   Number of parameters 2   Info. Criterion: AIC = 6.88737   Info. Criterion: BIC = 6.93120   Restricted log likelihood 31.19884  ++ ++  Poisson Regression   Chi squared = 25.08061 RsqP= .2317   G  squared = 26.13767 RsqD= .1444   Overdispersion tests: g=mu(i) : 1.954   Overdispersion tests: g=mu(i)^2: 2.618  ++ +++++++ Variable Coefficient  Standard Error b/St.Er.P[Z>z] Mean of X +++++++ Constant 1.90854253 .12998621 14.683 .0000 PROD  5.16576744 2.51306610 2.056 .0398 .00302000
154
Appendix A Matrix Algebra ⎡2 4 ⎤ ⎡1 3 3⎤ ⎢1 5⎥ compute AB, A′B′, and BA. 1. For the matrices A = ⎢ and B = ⎥ ⎢ ⎥ ⎣2 4 1⎦ ⎢⎣6 2 ⎥⎦ ⎡10 22 10 ⎤ ⎡10 11 10⎤ ⎡23 25⎤ ⎢11 23 8 ⎥ , A′B′ = (BA)′ = ⎢22 23 26⎥ . AB = ⎢ , BA = ⎥ ⎢ ⎢ ⎥ ⎥ ⎣14 30⎦ ⎢⎣10 26 20⎥⎦ ⎢⎣10 8 20⎥⎦ 2. Prove that tr(AB) = tr(BA) where A and B are any two matrices that are conformable for both multiplications. They need not be square. The ith diagonal element of AB is ∑ j aij b ji . Summing over i produces tr(AB) = ∑i ∑i a ij b ji . The jth diagonal element of BA is ∑ j b ji aij . Summing over i produces tr(BA) = ∑i ∑ j b ji a ij . 3. Prove that tr(A′A) = ∑i ∑ j aij2 . The jth diagonal element of A′A is the inner product of the jth column of A, or ∑i aij2 . Summing over j produces tr(A′A) = ∑ j ∑i aij2 = ∑i ∑ j aij2 . 4. Expand the matrix product X = {[AB + (CD)′][(EF)1 + GH]}′. Assume that all matrices are square and E and F are nonsingular. In parts, (CD)′ = D′C′ and (EF)1 = F1E1. Then, the product is = (ABF1E1 + ABGH + D′C′F1E1 + D′C′GH)′ {[AB + (CD)′][(EF)1 + GH]}′ = (E1)′(F1)′B′A′ + H′G′B′A′ + (E1)′(F1)′CD + H′G′CD. 5. Prove for that for K×1 column vectors, xi i = 1,...,n, and some nonzero vector, a,
∑ (x n
i =1
i
− a )( xi − a ) ' = X′M 0 X + n ( x − a )( x − a ) ' .
Write xi  a as [( x i  x ) + ( x  a)]. Then, the sum is
∑
n i =1
[( xi  x ) + ( x  a)] [(xi  x ) + ( x  a)]′ =
∑ +∑
n
i =1 n i =1
∑ ( x  x )( x  a)′ + ∑
( xi  x )( xi  x )′ + i
n
i =1 n i =1
( x  a) ( x  a)′ ( x  a) (xi  x )′
Since ( x  a) is a vector of constants, it may be moved out of the summations. Thus, the fourth term is ( x  a)
∑
n i =1
(xi  x )′ = 0. The third term is likewise. The first term is X′M0X by the definition while the
second is n( x  a) ( x  a)′. 6. Let A be any square matrix whose columns are [a1,a2,...,aM] and let B be any rearrangement of the columns of the M×M identity matrix. What operation is performed by the multiplication AB? What about BA? B is called a permutation matrix. Each column of B, say, bi, is a column of an identity matrix. The jth column of the matrix product AB is A bi which is the jth column of A. Therefore, post multiplication of A by B simply rearranges (permutes) the columns of A (hence the name). Each row of the product BA is one of the rows of A, so the product BA is a rearrangement of the rows of A. Of course, A need not be square for us
155
to permute its rows or columns. If not, the applicable permutation matrix will be rows and columns. ⎡0 7. Consider the 3×3 case of the matrix B in Exercise 6. For example, B = ⎢0 ⎢ ⎢⎣1
of different orders for the 0 1⎤ 1 0⎥ Compute B2 and ⎥ 0 0⎥⎦
B3. Repeat for a 4×4 matrix. Can you generalize your finding? ⎡0 0 1⎤ ⎡1 0 0⎤ 2 3 ⎢ ⎥ B = 1 0 0 B = ⎢0 1 0⎥ . ⎢ ⎢ ⎥ ⎥ ⎢⎣0 1 0⎥⎦ ⎢⎣0 0 1⎥⎦ Since each power of B is a rearrangement of I, some power of B will equal I. If n is this power, we also find, therefore, that Bn1 = B1. This will hold generally.
⎡1 4 7⎤ 8. Calculate A, tr(A) and A for A = ⎢3 2 5⎥ . ⎢ ⎥ ⎢⎣5 2 8 ⎥⎦ 1
A = 1(2)(8)+4(5)(5)+3(2)(7)5(2)(7)1(5)(2)3(4)(8) = 18, tr(A) = 1 + 2 + 8 = 11 ⎡ ⎛ 2 5⎞ ⎛ 4 7⎞ ⎛ 4 7⎞ ⎤ ⎟ − det ⎜ ⎟ det ⎜ ⎟ ⎥ ⎢ det ⎜ 2 8 2 8 ⎝ ⎠ ⎝ ⎠ ⎝ 2 5⎠ ⎥ ⎢ ⎡ − 6 / 18 18 / 18 − 6 / 18 ⎤ ⎛ 3 5⎞ ⎛ 1 7⎞ ⎛ 1 7⎞ ⎥ −1 ⎢ 1 A = det ⎜ ⎟ ⎟ − det ⎜ ⎟ ⎥ = ⎢ − 1 / 18 27 / 18 − 16 / 18⎥ . ⎢− det ⎜ ⎢ ⎥ 18 ⎢ ⎝ 5 8⎠ ⎝ 5 8⎠ ⎝ 3 5⎠ ⎥ 4 / 18 − 18 / 18 10 / 18 ⎥⎦ ⎢ ⎣ ⎢ ⎛ 3 2⎞ ⎛ 1 4⎞ ⎛ 1 4⎞ ⎥ det ⎜ ⎟ − det ⎜ ⎟ ⎟⎥ ⎢ det ⎜ ⎝ 5 2⎠ ⎝ 5 2⎠ ⎝ 3 2⎠ ⎦ ⎣
⎡25 7 ⎤ 9. Obtain the Cholesky decomposition of the matrix A = ⎢ ⎥. ⎣ 7 13⎦ Recall that the Cholesky decomposition of a matrix, A, is the matrix product LU = A where L is a 0 ⎤ ⎡λ11 λ 21 ⎤ ⎡λ ⎡25 7 ⎤ . = ⎢ 11 lower triangular matrix and U = L′. Write the decomposition as ⎢ ⎥. ⎢ ⎥. By ⎥ ⎣λ 21 λ 22 ⎦ ⎣ 0 λ 22 ⎦ ⎣ 7 13⎦ direct multiplication, 25 = λ211 so λ11 = 5. Then, λ11λ21= 7, so λ21 = 7/5 = 1.4. Finally, λ221 + λ222 = 13, so λ22 = 3.322. 10. A symmetric positive definite matrix, A, can also be written as A = UL, where U is an upper triangular matrix and L = U′. This is not the Cholesky decomposition, however. Obtain this decomposition of the matrix in Exercise 9. μ ⎤ ⎡μ 0 ⎤ ⎡μ ⎡25 7 ⎤ . = ⎢ 11 12 ⎥. ⎢ 11 Using the same logic as in the previous problem, ⎢ ⎥. Working ⎥ ⎣ 0 μ 22 ⎦ ⎣μ12 μ 22 ⎦ ⎣ 7 13⎦ from the bottom up, μ 22 = 13 = 3.606. Then, 7 = μ12μ22 so μ12 = 7/ 13 = 1.941. Finally, 25 = 2 μ 2 + μ 2 so μ11 = 25  49/13 = 21.23, or μ11 = 4.61. 11 12
11. What operation is performed by postmultiplying a matrix by a diagonal matrix? What about premultiplication? The columns are multiplied by the corresponding diagonal element. Premultiplication multiplies the rows by the corresponding diagonal element.
156
12. Are the following quadratic forms positive for all values of x? (a) y = x12 − 28 x1 x 2 + (11x 22 ), (b) y = 5x12 + x22 + 7 x32 + 4 x1 x2 + 6 x1 x3 + 8 x2 x3 ?
− 14⎤ ⎡ x1 ⎤ ⎡ 1 x2 ] ⎢ ⎥ ⎢ ⎥ . The determinant of the matrix is 121  196 ⎣− 14 11 ⎦ ⎢⎣ x2 ⎥⎦ = 75, so it is not positive definite. Thus, the first quadratic form need not be positive. The second uses the ⎡5 2 3⎤ matrix ⎢⎢2 1 4⎥⎥ . There are several ways to check the definiteness of a matrix. One way is to check the ⎢⎣ 3 4 7⎥⎦ The first may be written [ x1
signs of the principal minors, which must be positive. The first two are 5 and 5(1)2(2)=1, but the third, the determinant, is 34. Therefore, the matrix is not positive definite. Its three characteristic roots are 11.1, 2.9, and 1. It follows, therefore, that there are values of x1 , x2 , and x 3 for which the quadratic form is negative. 13. Prove that tr(A⊗B) = tr(A)tr(B). The jth diagonal block of the product is ajjB. Its ith diagonal element is ajjbii. If we sum in the jth block, we obtain ∑i a jjbii = a jj ∑i bii . Summing down the diagonal blocks gives the trace, ∑ j a jj ∑ i bii = tr(A)tr(B). 14. A matrix, A, is nilpotent if lim A k = 0. Prove that a necessary and sufficient condition for a symmetric k →∞
matrix to be nilpotent is that all of its characteristic roots be less than one in absolute value. Use the spectral decomposition to write A as CΛC′ where Λ is the diagonal matrix of characteristic roots. Then, the Kth power of A is CΛKC′. Sufficiency is obvious. Also, since if some λ is greater than one, ΛK must explode, the condition is necessary as well. ⎡2 4 3⎤ 15. Compute the characteristic roots of A = ⎢⎢4 8 6⎥⎥ . ⎢⎣ 3 6 5⎥⎦ The roots are determined by A  λ I = 0. For the matrix above, this is A  λI = (2λ)(8λ)(5λ) + 72 + 72  9(8λ)  36(2λ)  16(5λ) = λ3 + 15λ2  5λ = λ (λ2 15λ + 5) = 0. One solution is obviously zero. (This might have been apparent. The second column of the matrix is twice the first, so it has rank no more than two, and therefore no more than two nonzero roots.) The other two roots
(
)
are 15 ± 205 / 2 = .341 and 4.659. 16. Suppose A = A(z) where z is a scalar. What is ∂x′Ax/∂z? Now, suppose each element of x is also a function of z. Once again, what is ∂x′Ax/∂z? The quadratic form is ∑i ∑ j xi x j aij , so ∂x′A(z)x/∂z =
∑i ∑ j xi x j (∂aij / ∂z) = x′(∂A(z)/∂z)x where ∂A(z)/∂z is a matrix of partial derivatives.
Now, if each element of x is also a function of z, then, ∂x′Ax/∂z =
∑i ∑ j xi x j (∂aij / ∂z) + ∑i ∑ j (∂xi / ∂z) x j aij
+
∑i ∑ j xi (∂x j / ∂z)aij
= x′(∂A(z)/∂z)x + (∂x(z)/∂z)′A(z)x(z) + x(z)′A(z)(∂x(z)/∂z) If A is symmetric, this simplifies a bit to x′(∂A(z)/∂z)x + 2(∂x(z)/∂z)′A(z)x(z). 17. Show that the solutions to the determinantal equations B  λA = 0 and A1B  λI = 0 are the same. How do the solutions to this equation relate to those of the equation B1A  μI = 0? Since A is assumed to be nonsingular, we may write
157
B  λA = A(A −1 B  λI). Then, B  λA = A×A1B  λI. The determinant of A is nonzero if A is nonsingular, so the solutions to the two determinantal equations must be the same. B1A is the inverse of A1B, so its characteristic roots must be the reciprocals of those of A1B. There might seem to be a problem here since these two matrices need not be symmetric, so the roots could be complex. But, for the application noted, both A and B are symmetric and positive definite. As such, it can be shown tat the solution is the same as that of a third determinantal equation involving a symmetric matrix. 18. Using the matrix A in Exercise 9, find the vector x that minimizes y = x′Ax + 2x1 + 3x2  10. What is the value of y at the minimum? Now, minimize y subject to the constraint x1 + x2 = 1. Compare the two solutions. The solution which minimizes y = x′Ax + b′x + d will satisfy ∂y∂x = 2Ax + b = 0. For this problem, ⎡ 13 / 276 − 7 / 276⎤ ⎡2⎤ ⎡25 7 ⎤ A= ⎢ , b = ⎢ ⎥ , and A1 = ⎢ ⎥ , so the solution is x1 =5/552 ⎥ ⎣− 7 / 276 25 / 276 ⎦ ⎣ 3⎦ ⎣ 7 13⎦ = .0090597 and x2 = 61/552 = .110507. The constrained maximization problem may be set up as a Lagrangean, L* = x′Ax + b′x + d + λ (c′x  1) where c = [1,1]′. The necessary conditions for the solution are ∂L*/∂x = 2Ax + b + λc =0 ∂L*/∂λ = c′x  1 = 0, ⎡2 A c ⎤ ⎡ x ⎤ ⎡ b ⎤ or, ⎢ c' 0⎥ ⎢λ ⎥ = ⎢ 1 ⎥ . ⎣ ⎦⎣ ⎦ ⎣ ⎦ ⎡ − 2⎤ ⎡50 14 1⎤ ⎡ x1 ⎤ ⎥ ⎢ ⎥ ⎢ Inserting A, b, and c produces the solution ⎢14 26 1⎥ ⎢ x2 ⎥ = ⎢⎢ − 3⎥⎥. The solution to the three equations ⎢⎣ 1 ⎥⎦ ⎢⎣ 1 1 0⎥⎦ ⎢⎣ λ ⎥⎦ is obtained by premultiplying the vector on the right by the inverse of the matrix on the left. The solutions are 0.27083, 0.72917, and, 25.75. The function value at the constrained solution is 4.240, which is larger than the unconstrained value of 10.00787.
19. What is the Jacobian for the following transformations? = x1 /x2 , y1 = ln x1 lnx2 + lnx3, lny2 = x1x2x3. and y3 Let capital letters denote logarithms. Then, the three transformations can be written as = X1  X2 Y1 = X1  X2 + X3 Y2 = X1 + X2 +X3. Y3 ⎡1 − 1 0⎤ This linear transformation is Y = ⎢⎢1 − 1 1⎥⎥ X = JX . The inverse transformation is ⎢⎣1 1 1⎥⎦ ⎡1 − 1 / 2 1 / 2⎤ X = ⎢⎢0 − 1 / 2 1 / 2⎥⎥ Y = J −1Y . In terms of the original variables, then, x1 = y1(y2/y3)1/2 , x2 = (y3/y2)1/2, ⎢⎣1 1 0 ⎥⎦ and x3 = y1y2. The matrix of partial derivatives can be obtained directly, but an algebraic shortcut will prove useful for obtaining the Jacobian. Note first that ∂xi/∂yj = (xi/yj)(∂logxi/∂logyj). Therefore, the elements of the partial derivatives of the inverse transformations are obtained by multiplying the ith row by xi, where we will substitute the expression for xi in terms of the ys, then multiplying the jth column by (1/yj). Thus, the result of Exercise 11 will be useful here. The matrix of partial derivatives will be
158
⎡ ∂x1 / ∂y1 ∂x1 / ∂y2 ⎢∂x / ∂y ∂x / ∂y 1 2 2 ⎢ 2 ⎢⎣ ∂x3 / ∂y1 ∂x3 / ∂y2
∂x1 / ∂y3 ⎤ ∂x2 / ∂y3 ⎥⎥ = ∂x3 / ∂y3 ⎦⎥
0 0 ⎤ 0 ⎤ ⎡1 − 1 / 2 1 / 2⎤ ⎡1 / y1 ⎥ ⎥ ⎢ ⎢ 1 / y2 0 ⎥⎥. x 2 0 ⎥ ⎢0 − 1 / 2 1 / 2⎥ ⎢ 0 0 1 / y3 ⎥⎦ 1 0 ⎥⎦ ⎢⎣ 0 0 x3 ⎥⎦ ⎢⎣1 The determinant of the product matrix is the product of the three determinants. The determinant of the center matrix is 1/2. The determinants of the diagonal matrices are the products of the diagonal elements. Therefore, the Jacobian is J = abs(∂x/∂y′)= ½(x1x2x3)/(y1y2y3) = 2(y1/y2) (after making the substitutions for xi). ⎡ x1 ⎢0 ⎢ ⎢⎣ 0
0
20. Prove that exchanging two columns of a square matrix reverses the sign of its determinant. (Hint: use a permutation matrix. See Exercise 6.) Exchanging the first two columns of a matrix is equivalent to postmultiplying it by a permutation matrix B = [e2,e1,e3,e4,...] where ei is the ith column of an identity matrix. Thus, the determinant of the matrix is AB = A B. The question turns on the determinant of B. Assume that A and B have n columns. To obtain the determinant of B, merely expand it along the first row. The only nonzero term in the determinant is (1)In1  = 1, where In1 is the (n1) × (n1) identity matrix. This completes the proof. 21. Suppose x=x(z) where z is a scalar. What is ∂[(x′Ax)/(x′Bx)]/z? The required derivatives are given in Exercise 16. Let g = ∂x/∂z and let the numerator and denominator be a and b, respectively. Then, ∂(a/b)/∂ z = [b(∂a/∂z)  a(∂b/∂z)]/b2 = [x′Bx(2x′Ag)  x′Ax(2x′Bg)] / (x′Bx) 2= 2[x′Ax/x′Bx][x′Ag/x′Ax  x′Bg/x′Bx]. 22. Suppose y is an n×1 vector and X is an n×K matrix. The projection of y into the column space of X is defined in the text after equation (255), yˆ = Xb. Now, consider the projection of y* = cy into the column space of X* = XP where c is a scalar and P is a nonsingular K × K matrix. Find the projection of y* into the column space of X*. Prove that the cosine of the angle between y* and its projection into the column space of X* is the same as that between y and its projection into the column space of X. How do you interpret this result? The projection of y* into the column space of X* is X*b* where b* is the solution to the set of equations X*′y* = X*′X*b* or P′X′(cy) = P′X′XPb*. Since P is nonsingular, P′ has an inverse. Premultiplying the equation by (P′)1, we have cX′y = X′X(Pb*) or X′y = X′X[(1/c)Pb*]. Therefore, in terms of the original y and X, we see that b = (1/c)Pb* which implies b* = cP1 b. The projection is X*b* = (XP)(cP1b) = cXb. We conclude, therefore, that the projection of y* into the column space of X* is a multiple c of the projection of y into the space of X. This makes some sense, since, if P is a nonsingular matrix, the column space of X* is exactly the same as the same as that of X. The cosine of the angle between y* and its projection is that between cy and cXb. Of course, this is the same as that between y and Xb since the length of the two vectors is unrelated to the cosine of the angle between them. Thus, cosθ = (cy) ′(cXb))/(cy×cXb) = (y′Xb))/(y×Xb).
⎡1 1 1 1 ⎤ 1 23. For the matrix X′ = ⎢ ⎥ , compute P = X(X ′X) X′ and M = (I  P). Verify that MP = 0. 4 − 2 3 − 5 ⎣ ⎦ 1 3 ⎤ ⎡ Let Q = ⎢ ⎥ (Hint: Show that M and P are idempotent.) 2 8 ⎦ ⎣ (a) Compute the P and M based on XQ instead of X. (b) What are the characteristic roots of M and P? 0 ⎤ ⎡1 / 4 ⎡4 0 ⎤ First, X′X = ⎢ , (X ′X)1 = ⎢ ⎥, ⎥ ⎣ 0 1 / 54⎦ ⎣0 54⎦
159
⎡1 4 ⎤ ⎡ 59 11 51 − 13⎤ ⎢1 − 2⎥ 1 / 4 ⎢ ⎥ 0 ⎤ ⎡1 1 1 1 ⎤ 1 ⎢ 11 35 15 47 ⎥ ⎥ ⎡ X(X ′X)1X ′ = ⎢ = = P ⎢1 3 ⎥ ⎢⎣ 0 1 / 54 ⎥⎦ ⎢⎣4 − 2 3 − 5⎥⎦ 108 ⎢ 51 15 45 − 3 ⎥ ⎢ ⎥ ⎢ ⎥ ⎣1 − 5⎦ ⎣− 13 47 − 3 77 ⎦ ⎡ 49 − 11 − 51 13 ⎤ ⎢ ⎥ 1 ⎢ − 11 73 − 15 − 47 ⎥ M = IP = 3 ⎥ 108 ⎢− 51 − 15 63 ⎢ ⎥ 3 31 ⎦ ⎣ 13 − 47 (a) There is no need to recompute the matrices M and P for XQ, they are the same. Proof: The counterpart to P is (XQ)[(XQ) ′(XQ)]1(XQ) ′ = XQ[Q ′X ′XQ]1Q ′X ′ = XQQ1(X ′X)1(Q ′)1Q ′X ′ = X(X′X)1X ′. The M matrix would be the same as well. This is an application of the result found in the previous exercise. The P matrix is the projection matrix, and, as we found, the projection into the space of X is the same as the projection into the space of XQ. (b) Since M and P are idempotent, their characteristic roots must all be either 0 or 1. The trace of the matrix equals the sum of the roots, which tells how many are 1 and 0. For the matrices above, the traces of both M and P are 2, so each has 2 unit roots and 2 zero roots.
24. Suppose that A is an n×n matrix of the form A = (1ρI) + ρii′, where i is a column of 1s and 0 < ρ < 1. Write out the format of A explicitly for n = 4. Find all of the characteristic roots and vectors of A. (Hint: There are only two distinct characteristic roots, which occur with multiplicity 1 and n1. Every c of a certain type is a characteristic vector of A.) For an application which uses a matrix of this type, see Section 14.5 on the random effects model. ⎡ 1 ρ ρ ρ⎤ ⎢ρ 1 ρ ρ⎥ ⎥ . There are several ways to analyze this matrix. Here is a simple For n = 4, A = ⎢ ⎢ρ ρ 1 ρ⎥ ⎢ ⎥ ⎣ρ ρ ρ 1⎦ shortcut. The characteristic roots and vectors satisfy [(1ρ)I + ρii′]c = λc. Multiply this out to obtain (1ρ)c + ρii′c = λc or ρii′c = [λ (1ρ)]c. Let μ= λ  (1ρ), so ρii′c=μc. We need only find the characteristic roots of ρii′,μ. The characteristic roots of the original matrix are just λ = μ + (1ρ). Now, ρii′ is a matrix with rank one, since every column is identical. Therefore, n1 of the μs are zero. Thus, the original matrix has n1 roots equal to 0 + (1ρ) = (1 ρ). We can find the remaining root by noting that the sum of the roots of ρii′ equals the trace of ρii′. Since ρii′ has only one nonzero root, that root is the trace, which is nρ. Thus, the remaining root of the original matrix is (1  ρ+ nρ). The characteristic vectors satisfy the equation ρii′c = μc. For the nonzero root, we have ρii′c = nρc. Divide by nρ to obtain i(1/n)i′c = c. This equation states that for each element in the vector, ci = (1/n) ∑ i ci . This implies that every element in the characteristic vector corresponding to the root (1ρ+nρ) is the same, or c is a multiple of a column of ones. In particular, so that it will have unit length, the vector is (1 / n ) i. For the remaining zero roots, the characteristic vectors must satisfy ρi(i′c) = 0c = 0. If the characteristic vector is not to be a column of zeroes, the only way to make this an equality is to require i′c to be zero. Therefore, for the remaining n1 characteristic vectors, we may use any set of orthogonal vectors whose elements sum to zero and whose inner products are one. There are an infinite number of such vectors. For example, let D be any arbitrary set of n1 vectors containing n elements. Transform all columns of D into deviations from their own column means. Thus, we let F = M0D where M0 is defined in Section 2.3.6. Now, let C = F(F′F)2. C is a linear combination of the columns of F, so its columns sum to zero. By multiplying it out and using the results of Section 2.7.10, you will find that C′C = I, so the columns are orthogonal and have unit length.
160
25. Find the inverse of the matrix in Exercise 24. [Hint: Use (A66).] Using the hint, the inverse is [(1 − ρ) I]−1[ρii' ][(1 − ρ) I]−1 1 [(1 − ρ) I]−1 = {I  [ρ / (1 − ρ + nρ)]ii' } −1 1− ρ 1 + ρi '[(1 − ρ) I] ρi
( )
( )
26. Prove that every matrix in the sequence of matrices Hi+1 = Hi + didi′, where H0 = I, is positive definite. For an extension, prove that every matrix in the sequence of matrices in (E22) is positive definite if H0 = I. By repeated substitution, we find Hi+1 = I + x′Hi+1x = x′x +
∑ j =1 d j d j ' .
∑ j =1 (x' d j )(d j ' x) i
i
= x′x +
A quadratic form in Hi+1 is, therefore
∑ j =1 (x' d j ) 2 i
This is obviously positive for all x. A simple way to establish this for the matrix in (E22) is to note that in spite of its complexity, it is of the form Hi+1 = Hi + didi′ + fifi′. If this starts with a positive definite matrix, such as I, then the identical argument establishes its positive definiteness.
⎡ cos( x ) sin( x ) ⎤ 27. What is the inverse matrix of P = ⎢ ⎥ ? What are the characteristic roots of P? ⎣− sin( x ) cos( x ) ⎦ The determinant of P is cos2(x) + sin2(x) = 1, so the inverse just reverses the signs of the two off diagonal elements. The two roots are the solutions to PλI = 0, which is cos2(x) + sin2(x)  2λcos(x) + λ2 = 0. This simplifies because cos2(x) + sin2(x) = 1. Using the quadratic formula, then, λ= cos(x) ± (cos2(x)  1)1/2. But, cos2(x)  1 = sin2(x). Therefore, the imaginary solutions to the resulting quadratic are λ1,λ2 = cos(x) ± isin(x). 28. Derive the off diagonal block of A1 in Section B.6.4. For the simple 2×2 case, F2 is derived explicitly in the text, as F2 = (x′M0x)1 = 1/ Using (274), the off diagonal element is just F2( ∑ i xi )/n = x /
∑ (x i
i
∑ (x i
i
− x )2 .
− x ) 2 . To extend this to a matrix
containing a constant and K1 variables, use the result at the end of the section. The off diagonal vector in A1 when there is a constant and K1 other variables is F2A21(A11)1 = [X′M0X]1 x . In all cases, A11 is just n, so (A11)1 is 1/n. 29. (This requires a computer.) For the X′X matrix at the end of Section 2.4.1, (a) Compute the characteristic roots of X′X. (b) Compute the condition number of X′X. (Do not forget to scale the columns of the matrix so that the diagonal elements are 1.) 99.770⎤ . ⎡15.000 120.00 19.310 11179 ⎢120.00 1240.0 164.30 1035.9 875.60 ⎥ ⎢ ⎥ ⎢19.310 164.30 25.218 148.98 13122 . ⎥ The matrix is ⎢ ⎥ 1035.9 148.98 94386 799.02⎥ . . ⎢11179 ⎢⎣99.770 875.60 13122 799.02 716.67 ⎥⎦ . Its characteristic roots are 2486, 72.96, 19.55, 2.027, and .007354. To compute the condition number, we first extract D = diag(15,1240,25.218,943.86,716.67). To scale the matrix, we compute V = D2X′XD2. .8798823 .992845 .939515 .962265⎤ ⎡ 1 ⎢.879883 1 .929119 .957532 .928828⎥⎥ ⎢ 1 .965648 .976079⎥ . The resulting matrix is ⎢.992845 .929119 ⎥ ⎢ 1 .971503⎥ ⎢.939515 .957532 .965648 ⎢⎣.962265 .928828 .976079 .971503 1 ⎥⎦ The characteristic roots of this matrix are 4.801, .1389, .03716, .02183, and .0003527. The square root of the largest divided by the smallest is 116.675. These data are highly collinear by this measure.
161
Appendix B Probability and Distribution Theory 1. How many different 5 card poker hands can be dealt from a deck of 52 cards? ⎛ 52⎞ There are ⎜ ⎟ = (52×51×51...×1)/[(5×4×3×2×1)(47×46×...×1)] = 2,598,960 possible hands. ⎝ 5⎠ 2. Compute the probability of being dealt 4 of a kind in a poker hand. There are 48(13) possible hands containing 4 of a kind and any of the remaining 48 cards. Thus, given the answer to the previous problem, the probability of being dealt one of these hands is 48(13)/2598960 =.00024, or less than one chance in 4000. 3. Suppose a lottery ticket costs $1 per play. The game is played by drawing 6 numbers without replacement from the numbers 1 to 48. If you guess all six numbers, you win the prize. Now, suppose that N = the number of tickets sold and P = the size of the prize. N and P are related by N = 5 + 1.2P P = 1 + .4N N and P are in millions. What is the expected value of a ticket in this game? (Don't forget that you might have to share the prize with other winners.) The size of the prize and number of tickets sold are jointly determined. The solutions to the two equations are N = 11.92 million tickets and P = $5.77 million. The number of possible combinations of 48 ⎛ 48⎞ numbers without replacement is ⎜ ⎟ = (48×47×46...×1)/[(6×5×4×3×2×1)(42×41×...×1)] = 12,271,512 so the ⎝ 6⎠ probability of making the right choice is 1/12271512 = .000000081. The expected number of winners is the expected value of a binomial random variable with N trials and this success probability, which is N times the probability, or 11.92/12.27 = .97, or roughly 1. Thus, one would not expect to have to share the prize. Now, the expected value of a ticket is Prob[win](5.77 million  1) + Prob[lose](1) . 53 cents. 4. If x has a normal distribution with mean 1 and standard deviation 3, what are (a) Prob[x > 2]. (b) Prob[x > 1  x < 1.5]. Using the normal table, = 1  Prob[x < 2] (a) Prob[x > 2] = 1  Prob[2 < x < 2] = 1  Prob[(21)/3 < z < (21)/3] = 1  [F(1/3)  F(1)] = 1  .6306 + .1587 = .5281 (b) Prob[x > 1  x < 1.5] = Prob[1 < x < 1.5] / Prob[x < 1.5] Prob[1 < x < 1.5] = Prob[(11)/3 < z < (1.51)/3)] = Prob[z < 1/6]  Prob[z < 2/3] = .5662  .2525 = .3137. The conditional probability is .3137/.5662 = .5540. 5. Approximately what is the probability that a random variable with chisquared distribution with 264 degrees of freedom is less than 297? We use the approximation in (337), z = [2(297)]2  [2(264)  1]2 = 1.4155, so the probability is approximately .9215. To six digits, the approximation is .921539 while the correct value is .921559. 6. Chebychev Inequality For the following two probability distributions, find the lower limit of the probability of the indicated event using the Chebychev inequality and the exact probability using the appropriate table:
162
(a) x ~ Normal[0,32], and 4 < x < 4. (b) x ~ chisquared, 8 degrees of freedom, 0 < x < 16. The inequality given in (318) states that Prob[x  μ < kσ] > 1  1/k2. Note that the result is not informative if k is less than or equal to 1. (a) The range is 4/3 standard deviations, so the lower limit is 1  (3/4)2 or 7/16 = .4375. From the standard normal table, the actual probability is 1  2Prob[z < 4/3] = .8175. (b) The mean of the distribution is 8 and the standard deviation is 4. The range is, therefore, μ ± 2σ. The lower limit according to the inequality is 1  (1/2)2 = .75. The actual probability is the cumulative chisquared(8) at 16, which is a bit larger than .95. (The actual value is .9576.) 7. Given the following joint probability distribution, X  0 1 2 +0 .05 .1 .03 Y 1 .21 .11 .19 2 .08 .15 .08 (a) Compute the following probabilities: Prob[Y < 2], Prob[Y < 2, X > 0], Prob[Y = 1, X > 1]. (b) Find the marginal distributions of X and Y. (c) Calculate E[X], E[Y], Var[X], Var[Y], Cov[X,Y], and E[X2Y 3]. (d) Calculate Cov[Y,X2]. (e) What are the conditional distributions of Y given X = 2 and of X given Y > 0? (f) Find E[YX] and Var[YX]. Obtain the two parts of the variance decomposition Var[Y] = Ex[Var[YX]] + Varx[E[YX]]. We first obtain the marginal probabilities. For the joint distribution, these will be X: P(0) = .34, P(1) = .36, P(2) = .30 Y: P(0) = .18, P(1) = .51, P(2) = .31 Then, (a) Prob[Y < 2] = .18 + .51 = .69. Prob[Y < 2, X > 0] = .1 + .03 + .11 + .19 = .43. Prob[Y = 1, X $ 1] = .11 + .19 = .30. (b) They are shown above. = 0(.34) + 1(.36) + 2(.30) = .96 (c) E[X] E[Y] = 0(.18) + 1(.51) + 2(.31) = 1.13 E[X2] = 02(.34) + 12(.36) + 22(.30) = 1.56 2 E[Y ] = 02(.18) + 12(.51) + 22(.31) = 1.75 = 1.56  .962 = .6384 Var[X] = 1.75  1.132 = .4731 Var[Y] E[XY] = 1(1)(.11)+1(2)(.15)+2(1)(.19)+2(2)(.08) = 1.11 = 1.11  .96(1.13) = .0252 Cov[X,Y] E[X2Y 3] = .11 + 8(.15) + 4(.19) + 32(.08) = 4.63. = 1(12).11+1(22).19+2(12).15+2(22).08 = 1.81 (d) E[YX2] = 1.81  1.13(1.56) = .0472. Cov[Y,X2] = .03/.3 = .1 (e) Prob[Y = 0 * X = 2] Prob[Y = 1 * X = 2] = .19/.3 = .633 = .08/.3 = .267 Prob[Y = 1 * X = 2] = (.21 + .08)/(.51 + .31) = .3537 Prob[X = 0 * Y > 0] = (.11 + .15)/(.51 + .31) = .3171 Prob[X = 1 * Y > 0] = (.19 + .08)/(.51 + .31) = .3292. Prob[X = 2 * Y > 0] (f) E[Y * X=0] = 0(.05/.34)+1(.21/.34)+2(.08/.34) = 1.088 E[Y2 * X=0] = 12(.21/.34)+22(.08/.34) = 1.559 Var[Y* X=0] = 1.559  1.0882 = .3751 E[Y*X=1] = 0(.1/.36)+1(.11/.36)+2(.15/.36) = 1.139 E[Y2*X=1] = 12(.11/.36)+22(.15/.36) = 1.972 Var[Y*X=1] = 1.972  1.1392 = .6749 E[Y*X=2] = 0(.03/.30)+1(.19/.30)+2(.08/.30) = 1.167
163
E[Y2*X=2] = 12(.19/.30)+22(.08/.30) = 1.700 Var[Y*X=2] = 1.700  1.1672 = .6749 = .3381 E[Var[Y*X]] = .34(.3751)+.36(.6749)+.30(.3381) = .4719 Var[E[Y*X]] = .34(1.0882)+.36(1.1392)+.30(1.1672)  1.132 = 1.2781  1.2769 = .0012 E[Var[Y*X]] + Var[E[Y*X]] = .4719 + .0012 = .4731 = Var[Y]. ~
8. Minimum mean squared error predictor. For the joint distribution in Exercise 7, compute E[y  E[yx]]2. Now, find the a and b which minimize the function E[y  a  bx]2. Given the solutions, verify that E[y  E[yx]]2 < E[y  a  bx]2. The result is fundamental in least squares theory. Verify that the a and b which you found satisfy (368) and (369). (x=1) (x=2) (x=0) E[y  E[yx]]2 = (y=0) .05(0  1.088)2 + .10(0  1.139)2 + .03(0  1.167)2 (y=1) + .21(1  1.088)2 + .11(1  1.139)2 + .19(1  1.167)2 (y=2) + .08(2  1.088)2 + .15(2  1.139)2 + .08(2  1.167)2 = .4719 = E[Var[yx]]. The necessary conditions for minimizing the function with respect to a and b are ∂E[y  a  bx]2/∂a = 2E{[y  a  bx](1)} = 0 ∂E[y  a  bx]2/∂b = 2E{[y  a  bx](x)} = 0. First dividing by 2, then taking expectations produces E[y]  a  bE[x] =0 E[xy]  aE[x]  bE[x2] = 0. Solve the first for a = E[y]  bE[x] and substitute this in the second to obtain E[xy]  E[x](E[y]  bE[x])  bE[x2] = 0 = b(E[x2]  (E[x])2) or (E[xy]  E[x]E[y]) b = Cov[x,y] / Var[x] = .0708 / .4731 = .150 or a = E[y]  bE[x] = 1.13  (.1497)(.96) = 1.274. and The linear function compared to the conditional mean produces x=0 x=1 x=2 E[yx] 1.088 1.139 1.167 a + bx 1.274 1.124 .974 Now, repeating the calculation above using a + bx instead of E[yx] produces (x=1) (x=2) (x=0) (y=0) .05(0  1.274)2 + .10(0  1.124)2 + .03(0  .974)2 E[y  a  bx]2 = (y=1) + .21(1  1.274)2 + .11(1  1.124)2 + .19(1  .974)2 (y=2) + .08(2  1.274)2 + .15(2  1.124)2 + .08(2  .974)2 = .4950 > .4719. 9. Suppose x has an exponential distribution, f(x) = θeθx, x > 0. Find the mean, variance, skewness, and kurtosis of x. The Gamma integral will be useful for finding the raw moments.) In order to find the central moments, we will use the raw moments, E[x r ] =
∫
∞
θx r e −θ x dx . These
0
can be obtained by using the gamma integral. Making the appropriate substitutions, we have E[xr] = [θΓ(r+1)]/θr+1 = r!/θ r. The first four moments are: E[x] = 1/θ, E[x2] = 2/θ2, E[x3] = 6/θ3, and E[x4] = 24/θ4. The mean is, thus, 1/θ and the variance is 2/θ2  (1/θ)2 = 1/θ2. For the skewness and kurtosis coefficients, we have E[x  1/θ]3 = E[x3]  3E[x2]/θ + 3E[x]/θ2  1/θ3 = 2/θ3. The normalized skewness coefficient is 2. The kurtosis coefficient is E[x  1/θ]4 = E[x4]  4E[x3]/θ + 6E[x2]/θ2  4E[x]/θ3 + 1/θ4 = 9/θ4. The degree of excess is 6. 10. For the random variable in Exercise 9, what is the probability distribution of the random variable y = ex? What is E[y]? Prove that the distribution of this y is a special case of the beta distribution in (340). If y = ex, then x = lny, so the Jacobian is dx/dy = 1/y. The distribution of y is, therefore, f(y) = θeθ(lny)(1/y) = (θyθ)/y = θyθ1 for 0 < y < 1. This is in the form of (340) with y instead of x, c = 1, β = 1, and α= θ.
164
11. If the probability density of y is αy2(1y)3 for y between 0 and 1, what is α? What is the probability that y is between .25 and .75? This is a beta distribution of the form in (340) with α = 3 and β = 4. Therefore, the constant is Γ(3+4)/(Γ(3)Γ(4)) = 60. The probability is .75
∫.25
60y2(1y)3dy = 60 ∫
.75
.25
(y2  3y3 + 3y4  y5)dy = 60(y3/3  3y4/4 + 3y5/5  y6/6) ..75 25 = .79296.
1 2 3 4 12. Suppose x has the following discrete probability distribution: X Prob[X = x] .1 .2 .4 .3. Find the exact mean and variance of X. Now, suppose Y = 1/X. Find the exact mean and variance of Y. Find the mean and variance of the linear and quadratic approximations to Y = f(X). Are the mean and variance of the quadratic approximation closer to the true mean than those of the linear approximation? We will require a number of moments of x, which we derive first: E[x] = .1(1) + .2(2) + .4(3) + .3(4) = 2.9 = μ E[x2] = .1(1) + .2(4) + .4(9) + .3(16) = 9.3 Var[x] = 9.3  2.92 = .89 = σ2. For later use, we also obtain E[x  μ]3 = .1(1  2.9)3 + ... = .432 E[x  μ]4 = .1(1  2.9)4 + ... = 1.8737. The approximation is y = 1/x. The exact mean and variance are E[y] = .1(1) + .2(1/2) + .4(1/3) + .3(1/4) = .40833 Var[y] = .1(12) + .2(1/4) + .4(1/9) + .3(1/16)  .408332 = .04645. The linear Taylor series approximation around μ is y ≈ 1/μ + (1/μ2)(x  μ). The mean of the linear approximation is 1/μ = .3448 while its variance is (1/μ4)Var[xμ] = σ2/μ4 = .01258. The quadratic approximation is y ≈ 1/μ + (1/μ2)(x  μ) + (1/2)(2/μ3)(x  μ)2 = 1/μ  (1/μ2)(x  μ) + (1/μ3)(x  μ)2. The mean of this approximation is E[y] ≈ 1/μ + σ2/μ3 = .3813 while the variance is approximated by the variance of the right hand side, (1/μ4)Var[x  μ] + (1/μ6)Var[x  μ]2  (2/μ5)Cov[(xμ),(xμ)2] = (1/μ4)σ2 + (1/μ6)(E[x  μ]4  σ4]  (2/μ5)E[x  μ]3 = .01498. Neither approximation provides a close estimate of the variance. Note that in both cases, it would be possible simply to evaluate the approximations at the four values of x and compute the means and variances directly. The virtue of the approach above is that it can be applied when there are many values of x, and is necessary when the distribution of x is continuous. 13. Interpolation in the chisquared table. In order to find a percentage point in the chisquared table which is between two values, we interpolate linearly between the reciprocals of the degrees of freedom. The chisquared distribution is defined for noninteger values of the degrees of freedom parameter [see (339)], but your table does not contain critical values for noninteger values. Using linear interpolation, find the 99% critical value for a chisquared variable with degrees of freedom parameter 11.3. The 99% critical values for 11 and 12 degrees of freedom are 24.725 and 26.217. To interpolate linearly between these values for the value corresponding to 11.3 degrees of freedom, we use (1113 . − 1 / 12) c = 26.217 + (24.725  26.217) = 25.2009. (1 / 11 − 1 / 12) 14. Suppose x has a standard normal distribution. What is the pdf of the following random variable? y=
1
−
x2 2
,0 < y <
1
. [Hints: You know the distribution of z = x2 from (C30). The density of this z 2π 2π is given in (C39). Solve the problem in terms of y = g(z).] e
165
We know that z = x2 is distributed as chisquared with 1 degree of freedom. We seek the density of y = ke where k = (2π)2. The inverse transformation is z = 2lnk  2lny, so the Jacobian is 2/y = 2/y. The density of z is that of Gamma with parameters 1/2 and 1/2. [See (C39) and the succeeding discussion.] Thus, (1 / 2)1/ 2 − z / 2 −1/ 2 , z > 0. f(z) = e z Γ (1 / 2) z/2
Note, Γ(1/2) =
π . Making the substitution for z and multiplying by the Jacobian produces
(1 / 2)1/ 2 2 ( −1/ 2 )( 2 ln k − 2 ln y ) (2 ln k − 2 ln y ) −1/ 2 e Γ (1 / 2) y The exponential term reduces to y/k. The scale factor is equal to 2k/y. Therefore, the density is simply f(y) = 2(2lnk  2lny)1/2 = 2 (lnk  lny)1/2 = {2/[ln(1/(y(2π)1/2))]}, 0 < y < (2π)1/2.
f(y) =
15. The fundamental probability transformation. Suppose that the continuous random variable x has cumulative distribution F(x). What is the probability distribution of the random variable y = F(x)? (Observation: This result forms the basis of the simulation of draws from many continuous distributions.) The inverse transformation is x(y) = F1(y), so the Jacobian is dx/dy = F1′(y) = 1/f(x(y)) where f(.) is the density of x. The density of y is f(y) = f [F1(y)] × 1/f (x(y)) = 1, 0 < y < 1. Thus, y has a continuous uniform distribution. Note, then, for purposes of obtaining a random sample from the distribution, we can sample y1,...,yn from the distribution of y, the continuous uniform, then obtain x1 = x1(y1), ... xn = xn(yn). 16. Random number generators. Suppose x is distributed uniformly between 0 and 1, so f(x) = 1, 0 < x < 1. Let θ be some positive constant. What is the pdf of y = (1/θ)lnx. (Hint: See Section 3.5.) Does this suggest a means of simulating draws from this distribution if one has a random number generator which will produce draws from the uniform distribution? To continue, suggest a means of simulating draws from a logistic distribution, f(x) = ex/(1+ex)2. The inverse transformation is x = eθy so the Jacobian is dx/dy = θeθy. Since f(x) = 1, this Jacobian is also the density of y. One can simulate draws y from any exponential distribution with parameter θ by drawing observations x from the uniform distribution and computing y = (1/θ)lnx. Likewise, for the logistic distribution, the CDF is F(x) = 1/(1 + ex). Thus, draws y from the uniform distribution may be taken as draws on F(x). Then, we may obtain x as x = ln[F(x)/(1  F(x)] = ln[y/(1  y)]. 17. Suppose that x1 and x2 are distributed as independent standard normal. What is the joint distribution of y1 = 2 + 3x1 + 2x2 and y2 = 4 + 5x1? Suppose you were able to obtain two samples of observations from independent standard normal distributions. How would you obtain a sample from the bivariate normal distribution with means 1 and 2 variances 4 and 9 and covariance 3? We may write the pair of transformations as ⎡3 2⎤ ⎡ x1 ⎤ ⎡ y1 ⎤ ⎡2⎤ y = ⎢ ⎥ = ⎢ ⎥ + ⎢ ⎥ ⎢ ⎥ = b + Ax. ⎣5 0⎦ ⎣ x2 ⎦ ⎣ y2 ⎦ ⎣4⎦ The problem also states that x ~ N[0,I]. From (C103), therefore, we have y ~ N[b + A0, AIAN] where ⎡2⎤ ⎡13 15⎤ E[y] = b + A0 = b = ⎢ ⎥ , Var[y] = AA′ = ⎢ ⎥. ⎣4⎦ ⎣15 25⎦ For the second part of the problem, using our result above, we would require the A and b such that ⎡4 3⎤ b + A0 = (1,2)′ and AA′ = ⎢ ⎥ . The vector is obviously b = (1,2)′. In order to find the elements of A, ⎣ 3 9⎦ there are a few ways to proceed. The Cholesky factorization used in Exercise 9 is probably the simplest. Let y1 = 1 + 2x1. Thus, y1 has mean 1 and variance 4 as required. Now, let y2 = 2 + w1x1 + w2x2. The covariance between y1 and y2 is 2w1, since x1 and x2 are uncorrelated. Thus, 2w1 = 3, or w1 = 1.5. Now, Var[y2] = 0 ⎤ ⎡2 w12 + w22 = 9, so w22 = 9  1.52 = 6.75. The transformation matrix is, therefore, A = ⎢ ⎥ . This is 15 . 2 . 598 ⎣ ⎦ the Cholesky factorization of the desired AA′ above. It is worth noting, this provides a simple method of finding the requisite A matrix for any number of variables. Finally, an alternative method would be to use the
166
characteristic roots and vectors of AA′. The inverse square root defined in Section B.7.12 would also provide a method of transforming x to obtain the desired covariance matrix. 18. The density of the standard normal distribution, denoted φ(x), is given in (C28). The function based on the ith derivative of the density given by Hi = [(1)idiφ (x)/dxi]/φ(x), i = 0,1,2,... is called a Hermite polynomial. By definition, H0 = 1. (a) Find the next three Hermite polynomials. (b) A useful device in this context is the differential equation drφ(x)/dxr + xdr1φ(x)/dxr1 + (r1)dr2φ(x)/dxr2 = 0. Use this result and the results of part a. to find H4 and H5. The crucial result to be used in the derivations is dφ(x)/dx = xφ(x). Therefore, d2φ(x)/dx2 = (x2  1)φ(x) and d3φ(x)/dx3 = (3x  x3)φ(x). The polynomials are H1 = x, H2 = x2  1, and H3 = x3  3x. For part (b), we solve for drφ(x)/dxr = xdr1φ(x)/dxr1  (r1)dr2φ(x)/dxr2 Therefore, d4φ(x)/dx4 = x(3x  x3)φ(x)  3(x2  1)φ(x) = (x4  6x2 + 3)φ(x) and d5φ(x)/dx5 = (x5 + 10x3  15x)φ(x). Thus, H4 = x4  6x2 + 3 and H5 = x5  10x3 + 15x. 19. Continuation: orthogonal polynomials: The Hermite polynomials are orthogonal if x has a standard normal distribution. That is, E[HiHj] = 0 if i ≠ j. Prove this for the H1, H2, and H3 which you obtained above. E[H1(x)H2(x)] = E[x(x2  1)] = E[x3  x] = 0 since the normal distribution is symmetric. Then, E[H1(x)H3(x)] = E[x(x3  3x)] = E[x4  3x2] = 0. The fourth moment of the standard normal distribution is 3 times the variance. Finally, E[H2(x)H3(x)] = E[(x2  1)(x3  3x)] = E[x5  4x3 + 3x] = 0 because all odd order moments of the normal distribution are zero. (The general result for extending the preceding is that in a product of Hermite polynomials, if the sum of the subscripts is odd, the product will be a sum of odd powers of x, and if even, a sum of even powers. This provides a method of determining the higher moments of the normal distribution if they are needed. (For example, E[H1H3] = 0 implies that E[x4] = 3E[x2].) 20. If x and y have means μx and μy and variances σ x2 and σ y2 and covariance σxy, what is the approximation of the covariance matrix of the two random variables f1 = x/y and f2 = xy? 2 x σ 2 σ y μ 2 2σ xy μ x − The elements of JΣJN are (1,1) = 2x + μy μ 4y μ 3y (1,2) = σ 2x  σ 2y μ 2x / μ 4y (2,2) = σ 2x μ 4y + σ 2y μ 2x + 2σxyμxμy. 21. Factorial Moments. For finding the moments of a distribution such as the Poisson, a useful device is the factorial moment. (The Poisson distribution is given in Example 3.1.) The density is f(x) = eλλx / x!, x = 0,1,2,... To find the mean, we can use
E[x]
=
∞
∑ x = 0 xf ( x )
∑ = λ∑
=
∞
x =1 ∞
=
∑
∞
xe − λ λx / x !
x =0
e − λ λx −1 / ( x − 1)!
y =0
e − λ λy / y !
= λ, since the probabilities sum to 1. To find the variance, we will extend this method by finding E[x(x1)], and likewise for other moments. Use this method to find the variance and third central moment of the Poisson distribution. (Note that this device is used to transform the factorial in the denominator in the probability.)
167
Using the same technique, E[x(x1)]
=
∑ x = 0 x( x − 1) f ( x) = ∑ x = 0 x( x − 1)e − λ λx / x ! ∞
∞
∑ = λ ∑ ∞
e − λ λx − 2 / ( x x=2 ∞ 2 e − λ λy / y ! y =0
=
− 2)!
= λ2 = E[x2]  E[x] 2 = λ2 + λ. So, E[x ] 2 Since E[x] = λ, it follows that Var[x] = (λ + λ)  λ2 = λ. Following the same pattern, the preceding produces E[x(x1)(x2)] = E[x3]  3E[x2] + 2E[x]. = λ3. 3 = λ3 + 3(λ + λ2)  2λ Therefore, E[x ] = λ3 + 3λ2 + λ. 3 = E[x3]  3λE[x2] + 3λ2E[x]  λ3 Then, E[x  E[x]] = λ. 22. If x has a normal distribution with mean μ and standard deviation σ, what is the probability distribution of y = e x? If y = e x, then x = lny and the Jacobian is dx/dy = 1/y. Making the substitution, 1 2 1 − [ (ln y − μ )/ σ ] f(y) = e 2 σy 2π This is the density of the lognormal distribution. 23. If y has a lognormal distribution, what is the probability distribution of y 2? Let z = y2. Then, y = z and dy/dz = 1/(2 z ). Inserting these in the density above, we find f(z)
=
1
σ 2π 1
1
1
z 2 z −
1 ⎡⎛ 1 ⎞ ⎤ − ⎢ ⎜ ln z − μ ⎟ / σ ⎥ ⎠ ⎦ 2 ⎣⎝ 2 e
2
,z > 0
1 [ (ln z − 2 μ ) / ( 2 σ)] 2 2
, z > 0. (2σ ) z 2π Thus, z has a lognormal distribution with parameters 2μ and 2σ. The general result is that if y has a lognormal distribution with parameters μ and σ, y r has a lognormal distribution with parameters rμ and rσ.
=
e
24. Suppose y, x1, and x2 have a joint normal distribution with parameters μN = [1, 2, 4] ⎡ 2 3 1⎤ and covariance matrix Σ = ⎢⎢ 3 5 2⎥⎥ ⎢⎣1 2 6⎥⎦ (a) Compute the intercept and slope in the function E[y*x1], Var[y*x1], and the coefficient of determination in this regression. (Hint: See Section 3.10.1.) (b) Compute the intercept and slopes in the conditional mean function, E[y*x1,x2]. What is E[y*x1=2.5,x2=3.3]? What is Var[y*x1=2.5,x2=3.3]? First, for normally distributed variables, we have from (3102), = μy + Cov[y,x]{Var[x]}1(x  :x) E[y*x] and Var[y*x] = Var[y]  Cov[y,x]{Var[x]}1Cov[x,y] and COD = Var[E[y*x]] / Var[y] = Cov[y,x]{Var[x]}1Cov[x,y] / Var[y]. We may just insert the figures above to obtain the results. E[y*x1] = 1 + (3/5)(x1  2) = .2 + .6x1, Var[y*x1] = 2  3(1/5)3 = 1/5 = .2
168
= .62(5) / 2 = .9
COD
−1
⎡5 2⎤ ⎡3⎤ E[y*x1,x2] = 1 + [ 3 1] ⎢ ⎥ ⎢ ⎥ ⎣2 6⎦ ⎣1⎦ = .4615 + .6154x1  .03846x2, Var[y*x1,x2] = 2  (.6154,.03846)(3,1)N = .1923. E[y*x1=2.5,x2=3.3] = 1.3017. The conditional variance is not a function of x1 or x2. 25. What is the density of y = 1/x if x has a chisquared distribution? The density of a chisquared variable is a gamma variable with parameters 1/2 and n/2 where n is the degrees of freedom of the chisquared variable. Thus, 1
n
(1 / 2) n / 2 − 2 x 2 −1 e x , x > 0. Γ ( n / 2) If y = 1/x then x = 1/y and dx/dy = 1/y2. Therefore, after multiplying by the Jacobian, f ( x) =
1
n
+1
(1 / 2) n / 2 − 2 y ⎛ 1 ⎞ 2 f ( y) = e , y > 0. ⎜ ⎟ Γ ( n / 2) ⎝ y⎠
26. What is the density and what are the mean and variance of y = 1/x if x has the gamma distribution described in Section C.4.5. λP − λx P −1 The density of x is f ( x ) = e x , x > 0. If y = 1/x, then x = 1/y, and the Jacobian is dx/dy Γ ( P) = 1/y2. Using the change of variable formula, as usual, the density of y is λP 1 − λ / y ⎛ 1 ⎞ f ( y) = e ⎜ ⎟ Γ ( P) y 2 ⎝ y⎠
P −1
, y > 0.
The mean is E ( y ) =
∫
∞
0
λP 1 − λ / y ⎛ 1 ⎞ y e ⎜ ⎟ Γ ( P) y 2 ⎝ y⎠
P −1
dy .
This is a
P
λP − λ / y ⎛ 1 ⎞ e ⎜ ⎟ dy. Now, in 0 Γ ( P) ⎝ y⎠ order to use the results for the gamma integral, we will have to make a change of variable. Let z = 1/y, so dy/dz = 1/z2. Making the change of variable, we ∞ λP ∞ λP ⎛ 1⎞ e − λz z P ⎜ 2 ⎟dz = e − λz z P − 2 dz . Now, we can use the gamma integral directly, find E ( y ) = ⎝z ⎠ 0 Γ ( P) 0 Γ ( P)
gamma integral (see Section 5.2.4b). Combine terms to obtain E ( y ) =
∫
∫
∞
∫
λP λ Γ ( P − 1) × = . Note that for this to exist, P must be greater than one. We can use P −1 Γ ( P) P −1 λ the same approach to find the variance. We start by finding E[y2]. First,
to find E(y) =
P −1
P −1
∞ λP ⎛ 1⎞ λP 1 − λ / y ⎛ 1 ⎞ e − λ / y ⎜ ⎟ dy . Once again, this is a gamma e dy = ⎟ ⎜ 2 0 0 Γ( P ) Γ ( P) y ⎝ y⎠ ⎝ y⎠ integral, which we can evaluate by first making the change of variable to z = 1/y. The integral is ∞ λP ∞ λP λP λ2 Γ ( P − 2) ⎛ 1⎞ E( y2 ) = e − λz z P −1 ⎜ 2 ⎟ dz = e − λz z P − 3dz . This is × = . ⎝z ⎠ 0 Γ ( P) 0 Γ ( P) Γ ( P) ( P − 1)( P − 2) λP − 2
E( y2 ) =
∫
∞
y2
∫
Now, Var[y] = E[y2]  E2[y] =
∫
∫
λ3 , P > 2. ( P − 1) 2 ( P − 2)
27. Suppose x1 and x2 have the bivariate normal distribution described in Section 3.8. Consider an extension of Example 3.4, where the bivariate normal distribution is obtained by transforming two independent standard normal variables. Obtain the distribution of z = exp(y1)exp(y2) where y1 and y2 have a bivariate normal distribution and are correlated. Solve this problem in two ways. First, use the
169
transformation approach described in Section C.6.4. Second, note that z = exp(y1+y2) = exp(w), so you can first find the distribution of w, then use the results of Section 3.5 (and, in fact, Section 3.4.4 as well). The (extremely) hard way to proceed is to define the joint transformations z1 = exp(y1)exp(y2) and z2 = exp(y2). The Jacobian is 1/(z1z2). The joint distribution is the Jacobian times the bivariate normal distribution, evaluated at y1 = logz1  logz2 and y2 = logz2, from which it is now necessary to integrate out z2. Obviously, this is going to be tedious, but the hint gives a much simpler way to proceed. The variable w = y1+y2 has a normal distribution with mean μ = μ1+μ2 and variance σ2 = (σ12 + σ22 + 2σ12). We already have a simple result for exp(w) in Exercise 22; this has a lognormal distribution. 28. Probability Generating Function. For a discrete random variable, x, the function E[tx] =
∞
∑ x = 0 t x Prob[ X = x]
is called the probability generating function because in the function, the coefficient on ti is Prob[X=i]. Suppose that x is the number of the repetitions of an experiment with probability π of success upon which the first success occurs. The density of x is the geometric distribution, Prob[X=x] = (1  π)x1π. What is the probability generating function? E[tx]
=
∑
∞ x =0
t x (1 − π) x −1 π
∞ π [t (1 − π)] x x =0 (1 − π) π 1 . = (1 − π) 1 − t (1 − π)
∑
=
29. Moment Generating Function. For the random variable X, with probability density function f(x), if the function M(t) = E[etx] exists, it is the moment generating function. Assuming the function exists, it can be shown that drM(t)/dtrt=0 = E[xr ]. Find the moment generating functions for (a) The Exponential distribution of Exercise 9. (b) The Poisson distribution of Exercise 21. For the continuous variable in (a), For f(x) = θexp(θx), M(t) =
∫
∞
0
e tx θe −θ x dx =
∫
∞
θe −( θ t ) x dx .
0
This is θ times a Gamma integral (see Section 5.4.2b) with p=1, c=1, and a = (θt). Therefore, M(t) = θ/(θ t). For the Poisson distribution,
M(t)
= =
∑ ∑
= e
∑
∞ ∞ tx − λ x e e λ / x! = e − λ ( λe t ) x x=0 x=0 t t ∞ e − λ e λe e − λe ( λe t ) x / x ! x =0 ∞ − λ + λe t − λe t t x
∑
x=0
e
/ x!
( λe ) / x!
The sum is the sum of probabilities for a Poisson distribution with parameter λe t, which equals 1, so the term before the summation sign is the moment generating function, M(t) = exp[λ(e t  1)]. 28. Moment generating function for a sum of variables. When it exists, the moment generating function has a one to one correspondence with the distribution. Thus, for example, if we begin with some random variable and find that a transformation of it has a particular MGF, we may infer that the function of the random variable has the distribution associated with that MGF. A useful application is the following: If x and y are independent, the MGF of x + y is Mx(t)My(t). (a) Use this result to prove that the sum of Poisson random variables has a Poisson distribution. (b) Use the result to prove that the sum of chisquared variables has a chisquared distribution. [Note, you must first find the MGF for a chisquared variate. The density is given in (339).] (c) The MGF for the standard normal distribution is Mz = exp(t2/2). Find the MGF for the N[μ,σ2] distribution, then find the distribution of a sum of normally distributed variables.
170
(a) From the previous problem, Mx(t) = exp[λ(et  1)]. Suppose y is distributed as Poisson with parameter μ. Then, My(t)=exp[μ(et1)]. The product of these two moment generating functions is Mx(t)My(t)= exp[λ(e t  1)]exp[μ(e t  1)] = exp[(λ+μ)(e t  1)], which is the moment generating function of the Poisson distribution with parameter λ+μ. Therefore, on the basis of the theorem given in the problem, it follows that x+y has a Poisson distribution with parameter λ+μ. (b) The density of the Chisquared distribution with n degrees of freedom is [from (C39)] 1
n
(1 / 2) n / 2 − 2 x 2 −1 e x , x > 0. Γ ( n / 2) Let the constant term be k for the present. The moment generating function is f ( x) =
M(t)
∞
= k ∫ e tx e − x / 2 x ( n / 2 ) − 1dx
0 ∞ − x (1/ 2 − t ) ( n / 2 ) − 1 e x dx 0
= k∫
.
This is a gamma integral which reduces to M(t) = k(1/2  t)n/2Γ(n/2). Now, reinserting the constant k and simplifying produces the moment generating function M(t) = (1  2t) n/2. Suppose that xi is distributed as chisquared with ni degrees of freedom. The moment generating function of Σi xi is ΠiMi(t) = (1 − 2t ) ∑i i which is the MGF of a chisquared variable with n = Σi ni degrees of freedom. −
n /2
[
]
[ ] = exp[μt − (σ t ) / 2]
[
(c) We let y = σz + μ. Then, My(t) = E[exp(ty)] = E e t ( σ z +μ ) = e tμ E e σ tz = e tμ E e ( σ t ) z = e μ t e − (σ t )
2
/2
]
2 2
Using the same approach as in part b., it follows that the moment generating function for a sum of random variables with means μi and standard deviations σi is 1⎛ ⎡ ⎞ ⎤ M = exp ⎢ μ i − ⎜⎝ σ2 ⎟ t 2 . i i ⎠ ⎥ ∑i xi 2 ⎣ i ⎦
∑
∑
171
Appendix C Estimation and Inference 1. The following sample is drawn from a normal distribution with mean μ and standard deviation σ: x = 1.3, 2.1, .4, 1.3, .5, .2, 1.8, 2.5, 1.9, 3.2. Compute the mean, median, variance, and standard deviation of the sample.
∑ xi x = i =1 = n
1.52,
n
∑i =1 ( xi − x) n
s2 =
2
n−1
= .9418,
s = .97 median = 1.55, midway between 1.3 and 1.8. 2. Using the data in the previous exercise, test the following hypotheses: (a) μ > 2. (b) μ < .7. (c) σ2 = .5. (d) Using a likelihood ratio test, test the following hypothesis μ = 1.8, σ2 = .8. (a) We would reject the hypothesis if 1.52 is too small relative to the hypothesized value of 2. Since the data are sampled from a normal distribution, we may use a t test to test the hypothesis. The t ratio is t[9] = (1.52  2) / [.97/ 10 ]= 1.472. The 95% critical value from the t distribution for a one tailed test is 1.833. Therefore, we would not reject the hypothesis at a significance level of 95%. (b) We would reject the hypothesis if 1.52 is excessively large relative to the hypothesized mean of .7. The t ratio is t[9] = (1.52  .7) / [.97/ 10 ]= 2.673. Using the same critical value as in the previous problem, we would reject this hypothesis. (c) The statistic (n1)s2/σ2 is distributed as χ2 with 9 degrees of freedom. This is 9(.94)/.5 = 16.920. The 95% critical values from the chisquared table for a two tailed test are 2.70 and 19.02. Thus we would not reject the hypothesis. (d) The loglikelihood for a sample from a normal distribution is n 1 lnL = (n/2)ln(2π)  (n/2)lnσ2 ( x − μ) 2 2 i =1 i 2σ ∧
∧
. , σ The sample values are μ = x = 152
∑ ∑ ( x − x) =
2
n
2
i =1
i
= .8476. n The maximized loglikelihood for the sample is 13.363. A useful shortcut for computing the loglikelihood
∑
at the hypothesized values is value of μ = 1.8, this is
n i =1
∑i =1 ( xi − 18. ) 2 n
( xi − μ )
2
=
∑i =1 ( xi − x) 2 n
(
+ n x−μ
)
2
. For the hypothesized
= 9.26. The loglikelihood is 5(ln(2π)  5(ln.8)  (1/1.6)9.26 =
13.861. The likelihood ratio statistic is 2(lnL r  lnLu) = .996. The critical value for a chisquared with 2 degrees of freedom is 5.99, so we would not reject the hypothesis. 3. Suppose that the following sample is drawn from a normal distribution with mean μ and standard deviation σ: y = 3.1, .1, .3, 1.4, 2.9, .3, 2.2, 1.5, 4.2, .4. Test the hypothesis that the mean of the distribution which produced these data is the same as that which produced the data in Exercise 1. Test the hypothesis assuming that the variances are the same. Test the hypothesis that the variances are the same using an F test and using a likelihood ratio test. (Do not assume that the means are the same.)
172
If the variances are the same, x1 ~ N [μ1 , σ12 / n1 ] and x 2 ~ N [μ 2 , σ 22 / n2 ] , x1 − x2 ~ N [ μ1 − μ 2 , σ 2 {(1 / n1 ) + (1 / n2 )}], (n11)s12/σ2 ~ χ2[n11] and (n21)s22/σ2 ~ χ2[n21] χ2[n1 + n2  2] (n11)s12/σ2 + (n21)s22/σ2 ~
t =
Thus, the statistic
{( x − x ) − (μ 1
2
1
− μ2
)} /
[
σ 2 (1 / n1 ) + (1 / n2 )
]
{(n1 − 1) s12 / σ 2 + (n2 − 1) s22 / σ 2 } / (n1 + n2 − 2)
is the ratio of a standard normal variable to the square root of a chisquared variable divided by its degrees of freedom which is distributed as t with n1 + n2  2 degrees of freedom. Under the hypothesis that the means are
( x1 − x2 ) /
(1 / n1 ) + (1 / n2 )
equal, the statistic is
t=
The sample statistics are
n1 = 10, x 1 = 1.52, s12 = .9418
{(n1 − 1) s12 + (n2 − 1) s22 } / (n1 + n2 − 2)
n2 = 10, x 2 = 1.62, s22 = 2.0907 so t[18] = .1816. This is quite small, so we would not reject the hypothesis of equal means. For random sampling from two normal distributions, under the hypothesis of equal variances, the
[(n − 1)s statistic F[n 1,n 1] = [(n − 1)s 1
1
2 1
2
2 2
2
] / σ ] / (n
/ σ 2 / (n1 − 1) 2
2
− 1)
is the ratio of two independent chisquared variables, each
divided by its degrees of freedom. This has the F distribution with n11 and n21 degrees of freedom. If n1 = n2, the statistic reduces to F[n11,n21] = s12 / s22 . For our purposes, it is more convenient to put the larger variance in the denominator. Thus, for our sample data, F[9,9] = 2.0907 / .9418 = 2.2199. The 95% critical value from the F table is 3.18. Thus, we would not reject the hypothesis of equal variances. The likelihood ratio test is based on the test statistic λ = 2(lnL r  lnLu). The loglikelihood for the joint sample of 20 observations is the sum of the two separate loglikelihoods if the samples are assumed to be independent. A useful shortcut for computing the loglikelihood arises when the maximum likelihood ∧
estimates are inserted: At the maximum likelihood estimates, lnL = (n/2)[1 + ln(2π) + ln σ 2 ]. So, the loglikelihood for the sample is lnL2=(5/2)[1 + ln(2π) + ln((9/10)2.0907)]= 17.35007. (Remember, we don't make the degrees of freedom correction for the variance estimator.) The loglikelihood function for the sample of 20 observations is just the sum of the two loglikelihoods if the samples are completely independent. The unrestricted loglikelihood function is, thus, 13.363+(17.35001) = 30.713077. To compute the restricted loglikelihood function, we need the pooled estimator which does not assume that the ˆ2 means are identical. This would be σ = [(n11) s12 + (n21) s22 ]/[n1 + n2] = [9(.9418) + 9(2.0907)]/20 = 1.36463. So, the restricted loglikelihood is lnLr = (20/2)[1 + ln(2π) + ln(1.36463)] = 31.4876. Minus twice the difference is λ = 2[31.4876  (30.713077)] = 1.541. This is distributed as chisquared with one degree of freedom. The critical value is 3.84, so we would not reject the hypothesis. 4. A common method of simulating random draws from the standard normal distribution is to compute the sum of 12 draws from the uniform [0,1] distribution and subtract 6. Can you justify this procedure? The uniform distribution has mean 2 and variance 1/12. Therefore, the statistic 12( x  1/2) =
∑i =1 xi  6 is equivalent to 12
z = n ( x  μ) / σ. As n→∞, this converges to a standard normal variable.
Experience suggests that a sample of 12 is large enough to approximate this result. However, more recently developed random number generators usually use different procedures based on the truncation error which occurs in representing real numbers in a digital computer. 5. Using the data in Exercise 1, form confidence intervals for the mean and standard deviation.
173
Since the underlying distribution is normal, we may use the t distribution. Using (457), we obtain a 95% confidence interval for the mean of 1.52  2.262[.97/ 10 ] < μ < 1.52 + 2.262[.97/ 10 ] or .826 < μ < 2.214. Using the procedure in Example 4.30, we obtain a 95% confidence for σ2 of 9(.941)/19.02 < σ2 < 9(.941)/2.70 or .445 < σ2 < 3.137. Taking square roots gives the confidence interval for σ, .667 < σ < 1.771. 6. Based on a sample of 65 observations from a normal distribution, you obtain a median of 34 and a standard deviation of 13.3. Form a confidence interval for the mean. (Hint: Use the asymptotic distribution. See Example 4.15.) Compare your confidence interval to the one you would have obtained had the estimate of 34 been the sample mean instead of the sample median. The asymptotic variance of the median is πσ2/(2n). Using the asymptotic normal distribution instead of the t distribution, the confidence interval is 34  1.96(13.32π/130)2 < μ < 34 + 1.96(13.32π/130)2 or 29.95 < μ < 38.052. Had the estimator been the mean instead of the median, the appropriate asymptotic variance would be σ2/n, instead, which we would estimate with 13.32/65 = 2.72 compared to 4.274 for the median. The confidence interval would have been (30.77,37.24), which is somewhat narrower. 7. The random variable x has a continuous distribution f(x) and cumulative distribution function F(x). What is the probability distribution of the sample maximum? (Hint: In a random sample of n observations, x1, x2, ..., xn, if z is the maximum, then every observation in the sample is less than or equal to z. Use the cdf.) If z is the maximum, then every sample observation is less than or equal to z. The probability of this is Prob[x1 # z, x2 # z, ..., xn # z] = F(z)F(z)...F(z) = [F(z)]n. The density is the derivative, n[F(z)]n1f(z). 8. Assume the distribution of x is f(x) = 1/θ, 0 < x < θ. In random sampling from this distribution, prove that the sample maximum is a consistent estimator of θ. Note: you can prove that the maximum is the maximum likelihood estimator of θ. But, the usual properties do not apply here. Why not? (Hint: Attempt to verify that the expected first derivative of the loglikelihood with respect to θ is zero.) Using the result of the previous problem, the density of the maximum is n[z/θ]n1(1/θ), 0 < z < θ. Therefore, the expected value is E[z] = likewise, E[z2] =
θ
∫0
θ
∫0
zndz = [θn+1/(n+1)][n/θn] = nθ/(n+1). The variance is found
z2n(z/n)n1(1/θ)dz = nθ2/(n+2) so Var[z] = E[z2]  (E[z])2 = nθ2/[(n + 1)2(n+2)].
Using mean squared convergence we see that lim E[z] = θ and lim Var[z] = 0, so that plim z = θ. n→∞
n→∞
9. In random sampling from the exponential distribution, f(x) =
1
θ
−x
e θ , x > 0, θ> 0, find the maximum
likelihood estimator of θ and obtain the asymptotic distribution of this estimator. The loglikelihood is lnL = nlnθ  (1/θ) ∑i = 1 xi . The maximum likelihood estimator is obtained as n
the solution to ∂lnL/∂θ = n/θ + (1/θ2)
∑i =1 xi n
∧
= 0, or θ ML = (1/n)
of the MLE is {E[∂2lnL/∂θ2]}1 = {E[n/θ2  (2/θ3)
∑i =1 xi ]}1. n
∑i =1 xi n
= x . The asymptotic variance
To find the expected value of this random
variable, we need E[xi] = θ. Therefore, the asymptotic variance is θ /n. The asymptotic distribution is normal with mean θ and this variance. 2
10. Suppose in a sample of 500 observations from a normal distribution with mean μ and standard deviation σ, you are told that 35% of the observations are less than 2.1 and 55% of the observations are less than 3.6. Estimate μ and σ. If 35% of the observations are less than 2.1, we would infer that Φ[(2.1  μ)/σ] = .35, or (2.1  μ)/σ = .385 ⇒ 2.1  μ = .385σ. Likewise, Φ[(3.6  μ)/σ] = .55, or (3.6  μ)/σ = .126 ⇒ 3.6  μ = .126σ.
174
∧
∧
The joint solution is μ = 3.2301 and σ = 2.9354. It might not seem obvious, but we can also derive asymptotic standard errors for these estimates by constructing them as method of moments estimators. Observe, first, that the two estimates are based on moment estimators of the probabilities. Let xi denote one of the 500 observations drawn from the normal distribution. Then, the two proportions are obtained as follows: Let zi(2.1) = 1[xi < 2.1] and zi(3.6) = 1[xi < 3.6] be indicator functions. Then, the proportion of 35% has been obtained as z (2.1) and .55 is z (3.6). So, the two proportions are simply the means of functions of the sample observations. Each zi is a draw from a Bernoulli distribution with success probability π(2.1) = Φ((2.1μ)/σ) for zi(2.1) and π(3.6) = Φ((3.6μ)/σ) for zi(3.6). Therefore, E[ z (2.1)] = π(2.1), and E[ z (3.6)] = π(3.6). The variances in each case are Var[ z (.)] = 1/n[π(.)(1π(.))]. The covariance of the two sample means is a bit trickier, but we can deduce it from the results of random sampling. Cov[ z (2.1), z (3.6)]] = 1/n Cov[zi(2.1),zi(3.6)], and, since in random sampling sample moments will converge to their population counterparts,
Cov[zi(2.1),zi(3.6)] = plim [{(1/n) ∑i = 1 z i(2.1)zi(3.6)}  π(2.1)π(3.6)]. But, zi(2.1)zi(3.6) n
must equal [zi(2.1)]2 which, in turn, equals zi(2.1). It follows, then, that Cov[zi(2.1),zi(3.6)] = π(2.1)[1  π(3.6)]. Therefore, the asymptotic covariance matrix for the two sample 1 ⎡ π(2.1)(1 − π(2.1)) π(2.1)(1 − π(3.6)) ⎤ . If we insert our proportions is Asy.Var[ p(2.1), p(3.6)] = Σ = ⎢ n ⎣π(2.1)(1 − π(3.6)) π( 3.6)(1 − π( 3.6)) ⎥⎦ ⎡0.000455 0.000315⎤ sample estimates, we obtain Est . Asy.Var[ p( 2.1), p( 3.6)] = S = ⎢ ⎥. Now, ultimately, our ⎣0.000315 0.000495⎦ estimates of μ and σ are found as functions of p(2.1) and p(3.6), using the method of moments. The moment equations are ⎡1 n ⎤ ⎡ 2.1 − μ ⎤ m2.1 = ⎢ ∑ i = 1 zi ( 2.1) ⎥  Φ ⎢ = 0, ⎣n ⎦ ⎣ σ ⎥⎦ ⎡1 n ⎤ ⎡ 3.6 − μ ⎤ m3.6 = ⎢ ∑i = 1 zi ( 3.6) ⎥  Φ ⎢ = 0. n ⎣ ⎦ ⎣ σ ⎥⎦ ⎡ ∂m2.1 / ∂μ ∂m2.1 / ∂σ ⎤ Now, let Γ = ⎢ ⎥ and let G be the sample estimate of Γ. Then, the estimator of the ⎣∂m3.6 / ∂μ ∂m3.61 / ∂σ ⎦ ∧
∧
asymptotic covariance matrix of ( μ , σ ) is [GS1G′]1. The remaining detail is the derivatives, which are just ∂m2.1/∂μ = (1/σ)φ((2.1μ)/σ) and ∂m2.1/∂σ = (2.1μ)/σ[Mm2.1/Mσ] and likewise for m3.6. Inserting our sample . ⎡0.37046 − 014259 ⎤ estimates produces G = ⎢ ⎥ . Finally, multiplying the matrices and computing the ⎣0.39579 0.04987 ⎦ . − 012492 . ⎡ 010178 ⎤ necessary inverses produces [GS1G′]1 = ⎢ ⎥ . The asymptotic distribution would be . 016973 . ⎣ − 012492 ⎦ normal, as usual. Based on these results, a 95% confidence interval for μ would be 3.2301 ± 1.96(.10178)2 = 2.6048 to 3.8554.
11. For random sampling from a normal distribution with nonzero mean μ and standard deviation σ, find the asymptotic joint distribution of the maximum likelihood estimators of σ/μ and μ2/σ2.
ˆ 2 = (1/n) The maximum likelihood estimators, μˆ = (1/n) ∑i = 1 xi and σ n
∑i =1 ( xi − x) n
2
were given
in (449). By the invariance principle, we know that the maximum likelihood estimators of μ/σ and μ2/σ2 are
μˆ / σˆ and μˆ / σˆ 2 and the maximum likelihood estimate of σ is ∧
σˆ . To obtain the asymptotic joint
∧
ˆ2. distribution of the two functions of μ and σ , we first require the asymptotic joint distribution of μˆ and σ This is normal with mean vector (μ,σ2) and covariance matrix equal to the inverse of the information matrix. This is the inverse of
175
∑
n ⎡ ⎤ − n / σ2 − (1 / σ 3 ) ( x − μ) ⎡ ∂ 2 log L / ∂μ 2 ∂ 2 log L / ∂μ∂σ 2 ⎤ ⎢ ⎥ i =1 i − E⎢ 2 = ⎢ n n 2⎥ 2 2 2 2⎥ 3 4 6 ∂ log / ∂σ ∂μ ∂ log / ∂ ( σ ) L L ⎥⎦ ⎢− (1 / σ ) ⎢⎣ ( x − μ ) n / (2σ ) − (1 / σ ) x −μ ⎥ i =1 i i =1 i ⎣ ⎦ The off diagonal term has expected value 0. Each term in the sum in the lower right has expected value σ2, so, after collecting terms, taking the negative, and inverting, we obtain the asymptotic covariance matrix, ⎡σ 2 / n 0 ⎤ V = ⎢ ⎥ . To obtain the asymptotic joint distribution of the two nonlinear functions, we use 4 2σ / n⎥⎦ ⎢⎣ 0 the multivariate version of Theorem 4.4. Thus, we require H = JVJ′ where ⎡ 1/ σ ⎡ ∂(μ / σ ) / ∂μ ∂(μ / σ ) / ∂σ 2 ⎤ − μ / ( 2σ 3 ) ⎤ J= ⎢ = ⎢ ⎥ . The product is 2 2 2 2 2⎥ 2 − μ / σ 4 ⎥⎦ ⎢⎣2μ / σ ⎢⎣∂(μ / σ ) / ∂μ ∂(μ / σ ) / ∂σ ⎥⎦ 2μ / σ + ( μ / σ ) 3 ⎤ 1 ⎡ 1 + μ 2 / (2σ 2 ) H= ⎢ ⎥. n ⎢⎣2μ / σ + (μ / σ ) 3 4μ 2 / σ 2 + 2μ 4 / σ 4 ⎥⎦
∑
∑ (
)
12. The random variable x has the following distribution: f(x) = eλλx / x!, x = 0,1,2,... The following random sample is drawn: 1,1,4,2,0,0,3,2,3,5,1,2,1,0,0. Carry out a Wald test of the hypothesis that λ= 2. For random sampling from the Poisson distribution, the maximum likelihood estimator of λ is x = 25/15. (See Example 4.18.) The second derivative of the loglikelihood is − ∑i = 1 xi /λ2, so the the n
asymptotic variance is λ/n. The Wald statistic would be
( x − 2) W = ∧
2
= [(25/15  2)2]/[(25/15)/15] = 1.0.
λ/ n The 95% critical value from the chisquared distribution with one degree of freedom is 3.84, so the hypothesis would not be rejected. Alternatively, one might estimate the variance of with s2/n = 2.38/15 = 0.159. Then, the Wald statistic would be (1.6  2)2/.159 = 1.01. The conclusion is the same. ~
13. Based on random sampling of 16 observations from the exponential distribution of Exercise 9, we wish to test the hypothesis that θ =1. We will reject the hypothesis if x is greater than 1.2 or less than .8. We are interested in the power of this test. (a) Using the asymptotic distribution of x graph the asymptotic approximation to the true power function. (b) Using the result discussed in Example 4.17, describe how to obtain the true power function for this test. The asymptotic distribution of x is normal with mean θ and variance θ2/n. Therefore, the power function based on the asymptotic distribution is the probability that a normally distributed variable with mean equal to θ and variance equal to θ2/n will be greater than 1.2 or less than .8. That is, Power = Φ[(.8  θ)/(θ/4)] + 1  Φ[(1.2  θ)/(θ/4)]. Some values of this power function and a sketch are given below:
176
θ Approx. True Power Power .4 1.000 1.000 .5 .992 .985 .6 .908 .904 .7 .718 .736 .8 .522 .556 .9 .420 .443 1.0 .423 .421 1.1 .496 .470 1.2 .591 .555 1.3 .685 .647 1.4 .759 .732 1.5 .819 .801 1.6 .864 .855 1.7 .897 .895 1.8 .922 .925 1.9 .940 .946 2.0 .954 .961 2.1 .963 .972 Note that the power function does not have the symmetric shape of Figure 4.7 because both the variance and the mean are changing as θ changes. Moreover, the power is not the lowest at the value of θ = 1, but at about θ = .9. That means (assuming that the normal distribution is appropriate) that the test is slightly biased. The size of the test is its power at the hypothesized value, or .423, and there are points at which the power is less than the size. According to the example cited, the true distribution of x is that of θ/(2n) times a chisquared variable with 2n degrees of freedom. Therefore, we could find the true power by finding the probability that a chisquared variable with 2n degrees of freedom is less than .8(2n/θ) or greater than 1.2(2n/θ). Thus, True power = F(25.6/θ) + 1  F(38.4/θ) where F(.) is the CDF of the chisquared distribution with 32 degrees of freedom. Values for the correct power function are shown above. Given that the sample is only 16 observations, the closeness of the asymptotic approximation is quite impressive. 14. For the normal distribution, μ2k = σ2k(2k)!/(k!2k) and μ2k+1 = 0, k = 0,1,... Use this result to show that in ⎡6 0 ⎤ Example 4.27, θ1 = 0 and θ2 = 3, and JVJ′ = ⎢ ⎥. ⎣0 24⎦ For θ1 and θ2, just plug in the result above using k = 2, 3, and 4. The example involves 3 moments, m2, m3, and m4. The asymptotic covariance matrix for these three moments can be based on the formulas given in Example 4.26. In particular, we note, first, that for the normal distribution, Asy.Cov[m2,m3] and Asy.Cov[m3,m4] will be zero since they involve only odd moments, which are all zero. The necessary even moments are μ2 = σ2, μ4 = 3σ4. μ6 = 15σ6, μ8 = 105σ8. The three variances will be n[Asy.Var(m2)] = μ4  μ22 = 3σ4  (σ2)2 = 2σ4 n[Asy.Var(m3)] = μ6  μ32  6μ4μ2 + 9μ23 = 6σ6 n[Asy.Var(m4)] = μ8  μ42  8μ5μ3 + 16μ2μ32 = 96σ8 n[Asy.Cov(m2,m4)] = μ6  μ2μ4  4μ32 = 12σ6. and The elements of J are given in Example 4.27. For the normal distribution, this matrix would be J = ⎡ 0 1 / σ3 0 ⎤ ⎥ . Multiplying out JVJ/N produces the result given above. ⎢ 2 0 1 / σ 4 ⎥⎦ ⎢⎣− 6 / σ 15. Testing for normality. One method that has been suggested for testing whether the distribution underlying a sample is normal is to refer the statistic L = n{skewness2/6 + (kurtosis3)2/24} to the chisquared distribution with 2 degrees of freedom. Using the data in Exercise 1, carry out the test.
177
The skewness coefficient is .14192 and the kurtosis is 1.8447. (These are the third and fourth moments divided by the third and fourth power of the sample standard deviation.) Inserting these in the expression above produces L = 10{.141922/6 + (1.8447  3)2/24} = .59. The critical value from the chisquared distribution with 2 degrees of freedom (95%) is 5.99. Thus, the hypothesis of normality cannot be rejected. 16. Suppose the joint distribution of the two random variables x and y is f(x,y) = θe − (β + θ ) y (βy ) x / x ! β,θ 0, y $ 0, x = 0,1,2,... (a) Find the maximum likelihood estimators of β and θ and their asymptotic joint distribution. (b) Find the maximum likelihood estimator of θ/(β+θ) and its asymptotic distribution. (c) Prove that f(x) is of the form f(x) = γ(1γ)x, x = 0,1,2,... Then, find the maximum likelihood estimator of γ and its asymptotic distribution. (d) Prove that f(y*x) is of the form λeλy(λy) x/x! Prove that f(yx) integrates to 1. Find the maximum likelihood estimator of λ and its asymptotic distribution. (Hint: In the conditional distribution, just carry the xs along as constants.) (e) Prove that f(y) = θeθy then find the maximum likelihood estimator of θ and its asymptotic variance. (f) Prove that f(xy) = eβy (βy) x/x! . Based on this distribution, what is the maximum likelihood estimator of β?
∑i =1 xi log yi  ∑i =1 log( xi !)
The loglikelihood is lnL = nlnθ  (β+θ) ∑i =1 yi + lnβ ∑ i =1 xi + n
n
n
n
∂lnL/∂θ = n/θ ∑i =1 yi n
The first and second derivatives are
=  ∑i =1 yi + n
∂lnL/∂β
∑i =1 xi /β n
∂2lnL/∂θ2 = n/θ2 ∂2lnL/∂β2 =  ∑i =1 xi /β2 n
∂2lnL/∂β∂θ = 0. ∧
∧
Therefore, the maximum likelihood estimators are θ = 1/ y and β = x / y and the asymptotic covariance ⎡n / θ 2 matrix is the inverse of E ⎢ ⎢⎣ 0 expected value of
∑i =1 xi = n
⎤ 0 2 ⎥ . In order to complete the derivation, we will require the x /β ⎥ i =1 i ⎦
∑
n
nE[xi].
distribution of xi, which is f(x) =
∫
∞
In order to obtain E[xi], it is necessary to obtain the marginal θe − (β + θ) y (βy ) x / x ! dy = β x (θ / x !)
0
∫
∞
e − (β + θ ) y y x dy. This is βx(θ/x!)
0
times a gamma integral. This is f(x) = βx(θ/x!)[Γ(x+1)]/(β+θ)x+1. But, Γ(x+1) = x!, so the expression reduces to f(x) = [θ/(β+θ)][β/(β+θ)]x. Thus, x has a geometric distribution with parameter π = θ/(β+θ). (This is the distribution of the number of tries until the first success of independent trials each with success probability 1π. Finally, we require the expected value of xi, which is E[x] = [θ/(β+θ)]
∞
∑x=0
x[β/(β+θ)]x= β/θ. Then, the required asymptotic
−1
⎡θ 2 / n ⎤ ⎡n / θ 2 0 0 ⎤ = covariance matrix is ⎢ ⎥. ⎢ ⎥ 2 n(β / θ) / β ⎥⎦ βθ / n⎥⎦ ⎢⎣ 0 ⎢⎣ 0 The maximum likelihood estimator of θ/(β+θ) is is
θn /(β + θ) = (1/ y )/[ x / y + 1/ y ] = 1/(1 + x ). Its asymptotic variance is obtained using the variance of a nonlinear function V = [β/(β+θ)]2(θ2/n) + [θ/(β+θ)]2(βθ/n) = βθ2/[n(β+θ)3]. The asymptotic variance could also be obtained as [1/(1 + E[x])2]2Asy.Var[ x ].)
178
For part (c), we just note that γ = θ/(β+θ). For a sample of observations on x, the loglikelihood lnL = nlnγ + ln(1γ) ∑i =1 xi n
would be
∂lnL/dγ = n/γ 
∑i =1 xi /(1γ). n
A solution is obtained by first noting that at the solution, (1γ)/γ = x = 1/γ  1. The solution for γ is, thus, ∧
γ = 1 / (1 + x ).Of course, this is what we found in part b., which makes sense. For part (d) f(yx) =
f ( x, y) θe − (β + θ ) y (βy ) x (β + θ) x (β + θ) = . Cancelling terms and gathering f ( x) x ! θ βx
the remaining like terms leaves f(yx) = (β + θ)[(β + θ) y ] x e − (β + θ ) y / x ! so the density has the required form
{
}∫
with λ = (β+θ). The integral is [ λx +1 ] / x !
∞
e − λy y x dy . This integral is a Gamma integral which equals
0
Γ(x+1)/λx+1, which is the reciprocal of the leading scalar, so the product is 1. The loglikelihood function is lnL = nlnλ  λ ∑i = 1 yi + lnλ ∑ i =1 xi n
n
∂lnL/∂λ = ( ∑i =1 xi + n)/λ n
∑i =1 ln xi ! n
∑i =1 yi . n
∂2lnL/∂λ2 = ( ∑i =1 xi + n)/λ2. n
Therefore, the maximum likelihood estimator of λ is (1 +
x )/ y and the asymptotic variance, conditional on
⎡∧⎤
the xs is Asy.Var. ⎢λ ⎥ = (λ2/n)/(1 + x ) ⎣ ⎦ Part (e.) We can obtain f(y) by summing over x in the joint density. First, we write the joint density as
f ( x , y ) = θe − θy e −βy (βy ) x / x! . The sum is, therefore, f ( y ) = θe − θy
∑
∞
x =0
e −βy (βy ) x / x! . The sum is
that of the probabilities for a Poisson distribution, so it equals 1. This produces the required result. The maximum likelihood estimator of θ and its asymptotic variance are derived from lnL = nlnθ  θ ∑i = 1 yi n
∂lnL/∂θ = n/θ 
∑i =1 yi n
∂2lnL/∂θ2 = n/θ2. Therefore, the maximum likelihood estimator is 1/ y and its asymptotic variance is θ2/n. Since we found f(y) by factoring f(x,y) into f(y)f(xy) (apparently, given our result), the answer follows immediately. Just divide the expression used in part e. by f(y). This is a Poisson distribution with parameter βy. The loglikelihood function and its first derivative are lnL = β ∑i =1 yi + ln ∑i =1 xi + n
n
∂lnL/∂β =  ∑i =1 yi + n
∑i =1 xi ln yi  ∑i =1 ln xi ! n
n
∑i =1 xi /β, n
∧
from which it follows that β = x / y . 17. Suppose x has the Weibull distribution, f(x) = αβxβ1exp(αxβ), x, α, β > 0. (a) Obtain the loglikelihood function for a random sample of n observations. (b) Obtain the likelihood equations for maximum likelihood estimation of α and β. Note that the first provides an explicit solution for α in terms of the data and β. But, after inserting this in the second, we obtain only an implicit solution for β. How would you obtain the maximum likelihood estimators? (c) Obtain the second derivatives matrix of the loglikelihood with respect to α and β. The exact expectations of the elements involving β involve the derivatives of the Gamma function and are quite messy analytically. Of course, your exact result provides an empirical estimator. How
179
would you estimate the asymptotic covariance matrix for your estimators in part (b)? (d) Prove that αβCov[lnx,xβ] = 1. (Hint: Use the fact that the expected first derivatives of the loglikelihood function are zero.) The loglikelihood and its two first derivatives are logL = nlogα + nlogβ + (β1) ∑i = 1 log xi  α ∑i = 1 xiβ n
∑i =1 xiβ n n n/β + ∑i = 1 log xi  α ∑ (log xi ) xiβ i =1 n
∂logL/∂α = n/α ∂logL/∂β =
n
∧
Since the first likelihood equation implies that at the maximum, α = n / ∑i = 1 xiβ , one approach would be to n
scan over the range of β and compute the implied value of α. Two practical complications are the allowable range of β and the starting values to use for the search. The second derivatives are ∂2lnL/∂α2 = n/α2 ∂2lnL/∂β2 = n/β2  α ∑i = 1 (log xi ) 2 xiβ n
∂2lnL/∂α∂β =  ∑i =1 (log xi ) xiβ . n
If we had estimates in hand, the simplest way to estimate the expected values of the Hessian would be to evaluate the expressions above at the maximum likelihood estimates, then compute the negative inverse. First, since the expected value of ∂lnL/∂α is zero, it follows that E[xiβ] = 1/α. Now, E[∂lnL/∂β] = n/β + E[ ∑i = 1 log xi ]  αE[ ∑i =1 (log xi ) xiβ ]= 0 n
n
as well. Divide by n, and use the fact that every term in a sum has the same expectation to obtain 1/β + E[lnxi]  E[(lnxi)xiβ]/E[xiβ] = 0. Now, multiply through by E[xiβ] to obtain E[xiβ] = E[(lnxi)xiβ]  E[lnxi]E[xiβ] or 1/(αβ) = Cov[lnxi,xiβ]. ~ 18. The following data were generated by the Weibull distribution of Exercise 17: 1.3043 .49254 1.2742 1.4019 .32556 .29965 .26423 1.0878 1.9461 .47615 3.6454 .15344 1.2357 .96381 .33453 1.1227 2.0296 1.2797 .96080 2.0070 (a) Obtain the maximum likelihood estimates of α and β and estimate the asymptotic covariance matrix for the estimates. (b) Carry out a Wald test of the hypothesis that β = 1. (c) Obtain the maximum likelihood estimate of α under the hypothesis that β = 1. (d) Using the results of a. and c. carry out a likelihood ratio test of the hypothesis that β = 1. (e) Carry out a Lagrange multiplier test of the hypothesis that β = 1. As suggested in the previous problem, we can concentrate the loglikelihood over α. From ∂logL/∂α = 0, we find that at the maximum, α = 1/[(1/n)
∑i =1 xiβ ]. n
Thus, we scan over different values of β to seek
the value which maximizes logL as given above, where we substitute this expression for each occurrence of α. Values of β and the loglikelihood for a range of values of β are listed and shown in the figure below. β logL
180
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.05 1.06 1.07 1.08 1.09 1.10 1.11 1.12 1.2 1.3
62.386 49.175 41.381 36.051 32.122 29.127 26.829 25.098 23.866 23.101 22.891 22.863 22.841 22.823 22.809 22.800 22.796 22.797 22.984 23.693
The maximum occurs at β = 1.11. The implied value of α is 1.179. The negative of the second derivatives matrix at these values and its inverse are . 9.6506 ⎤ ⎛ ∧ ∧ ⎞ ⎡ 2555 ⎛ ∧ ∧ ⎞ ⎡.04506 −.2673⎤ and I 1 ⎜ α , β⎟ = ⎢ . I⎜⎝ α , β⎟⎠ = ⎢ ⎥ ⎝ ⎠ ⎣−.2673 .04148⎥⎦ ⎣9.6506 27.7552⎦ The Wald statistic for the hypothesis that β = 1 is W = (1.11  1)2/.041477 = .276. The critical value for a test of size .05 is 3.84, so we would not reject the hypothesis. ∧
If β = 1, then α = n / ∑i = 1 xi = 0.88496. The distribution specializes to the geometric distribution n
if β = 1, so the restricted loglikelihood would be logLr = nlogα  α ∑ i =1 xi = n(logα  1) at the MLE. n
logLr at α = .88496 is 22.44435. The likelihood ratio statistic is 2logλ = 2(23.10068  22.44435) = 1.3126. Once again, this is a small value. To obtain the Lagrange multiplier statistic, we would compute −1
⎡ − ∂ 2 log L / ∂α 2 − ∂ 2 log L / ∂α∂β⎤ ⎡∂ log L / ∂α ⎤ [∂ log L / ∂α ∂ log L / ∂β] ⎢− ∂ 2 log L / ∂α∂β − ∂2 log L / ∂β2 ⎥ ⎢ ∂ log L / ∂β ⎥ ⎢⎣ ⎥⎦ ⎣ ⎦ at the restricted estimates of α = .88496 and β = 1. Making the substitutions from above, at these values, we would have ∂logL/∂α = 0 1 n n ∂logL/∂β = n + ∑i = 1 log xi  ∑i = 1 xi log xi = 9.400342 x 2
∂2logL/∂α2 = − nx = 25.54955 1 n ∂2logL/∂β2 = n  ∑i = 1 xi (log xi ) 2 = 30.79486 x ∂2logL/∂α∂β = − ∑i = 1 xi log xi = 8.265. n
The lower right element in the inverse matrix is .041477. The LM statistic is, therefore, (9.40032)2.041477 = 2.9095. This is also well under the critical value for the chisquared distribution, so the hypothesis is not rejected on the basis of any of the three tests. 19. We consider forming a confidence interval for the variance of a normal distribution. As shown in Example 4.29, the interval is formed by finding clower and cupper such that Prob[clower < χ2[n1] < cupper] = 1  α.
181
The endpoints of the confidence interval are then (n1)s2/cupper and (n1)s2/clower. How do we find the narrowest interval? Consider simply minimizing the width of the interval, cupper  clower subject to the constraint that the probability contained in the interval is (1α). Prove that for symmetric and asymmetric distributions alike, the narrowest interval will be such that the density is the same at the two endpoints. The general problem is to minimize Upper  Lower subject to the constraint F(Upper)  F(Lower) = 1  α, where F(.) is the appropriate chisquared distribution. We can set this up as a Lagrangean problem, minL,U L* = U  L + λ{(F(U)  F(L))  (1  α)} The necessary conditions are ∂L*/∂U = 1 + λf(U) = 0 ∂L*/∂L = 1  λf(L) = 0 ∂L*/∂λ = (F(U)  F(L))  (1  α) = 0 It is obvious from the first two that at the minimum, f(U) must equal f(L). 20. Using the results in Example 4.26, and Section 4.7.2, estimate the asymptotic covariance matrix of the method of moments estimators of P and λ based on m−1 ' and m2′ . (Note: You will need to use the data in Table 4.1 to estimate V.) Using the income data in Table 4.1, (1/n) times the covariance matrix of 1/xi and xi2 is ⎡.000068456 − 2.811 ⎤ V = ⎢ . The moment equations used to estimate P and λ are 228050.⎥⎦ ⎣ − 2.811 E[m−1 ' − λ / ( P − 1)] = 0 and E[m2 ' − P( P + 1) / λ ] = 0 . The matrix of derivatives with respect to P ⎡ λ / ( P − 1) 2 − λ / ( P − 1) ⎤ . The estimated asymptotic covariance matrix is and λ is G = ⎢ 2 3⎥ ⎢⎣ − (2 P + 1) / λ 2 P ( P + 1) / λ ⎥⎦ .0073617 ⎤ ⎡ .17532 [GV1G′]1 = ⎢ ⎥. ⎣.0073617 .00041871⎦
182
Appendix D Large Sample Distribution Theory There are no exercises for Appendix D.
183
Appendix E Computation and Optimization 1. Show how to maximize the function 1 − (β − c ) 2 / 2 e f(β) = 2π with respect to β for a constant, c, using Newton's method. Show that maximizing logf(β) leads to the same solution. Plot f(β) and logf(β). The necessary condition for maximizing f(β) is 1 − (β − c ) 2 / 2 e [(β  c)] = 0 = (β  c)f(β). df(β)/dβ = 2π The exponential function can never be zero, so the only solution to the necessary condition is β = c. The second derivative is d2f(β)/dβ2 = (βc)df(β)/dβ  f(β) = [(βc)2  1]f(β). At the stationary value b = c, the second derivative is negative, so this is a maximum. Consider instead the function g(β) = logf(β) = (1/2)ln(2π)  (1/2)(β  c)2. The leading constant is obviously irrelevant to the solution, and the quadratic is a negative number everywhere except the point β = c. Therefore, it is obvious that this function has the same maximizing value as f(β). Formally, dg(β)/dβ = (β  c) = 0 at β = c, and d2g(β)/dβ2 = 1, so this is indeed the maximum. A sketch of the two functions appears below.
Note that the transformed function is concave everywhere while the original function has inflection points.
184
2. Prove that Newton’s method for minimizing the sum of squared residuals in the linear regression model will converge to the minimum in one iteration. The function to be maximized is f(β) = (y  Xβ)′(y  Xβ). The required derivatives are ∂f(β)/∂β = X′(y  Xβ) and ∂2f(β)/∂β∂β∂ = X′X. Now, consider beginning a Newton iteration at an arbitrary point, β0. The iteration is defined in (1217), β1 = β0  (X′X)1{X′(y  Xβ0)} = β0 + (X′X)1X′y  (X′X)1X′Xβ0 = (X′X)1X′y = b. Therefore, regardless of the starting value chosen, the next value will be the least squares coefficient vector. e −λ i λ i yi where λi = eβ'x i . The loglikelihood yi !
3. For the Poisson regression model, Prob[Yi = yixi] = function is lnL =
∑i = 1 n
logProb[Yi = yixi].
(a) Insert the expression for λi to obtain the loglikelihood function in terms of the observed data. (b) Derive the first order conditions for maximizing this function with respect to β. (c) Derive the second derivatives matrix of this criterion function with respect to β. Is this matrix negative definite? (d) Define the computations for using Newton’s method to obtain estimates of the unknown parameters. (e) Write out the full set of steps in an algorithm for obtaining the estimates of the parameters of this model. Include in your algorithm a test for convergence of the estimates based on Belsley’s suggested criterion. (f) How would you obtain starting values for your iterations? (g) The following data are generated by the Poisson regression model with logλ = α + βx. y 6 7 4 10 10 6 4 7 2 3 6 5 3 3 4 x 1.5 1.8 1.8 2.0 1.3 1.6 1.2 1.9 1.8 1.0 1.4 .5 .8 1.1 .7 Use your results from parts (a)  (f) to compute the maximum likelihood estimates of α and β. Also obtain estimates of the asymptotic covariance matrix of your estimates. The loglikelihood is
∑i =1 yi (β ' xi )  ∑i =1 log yi ! n n n =  ∑i = 1 eβ' x + β′ ∑i =1 xi yi  ∑i =1 log yi ! n n n The necessary condition is MlnL/Mβ =  ∑i =1 xi eβ' x + ∑i =1 xi yi = 0 or XNy = ∑i =1 xi λi . It is useful to n n note, since E[yi*xi] = λi = eβNxi, the first order condition is equivalent to ∑i =1 xi yi = ∑i = 1 xiE[yi*xi] or n XNy = XNE[y], which makes sense. We may write the first order condition as MlnL/Mβ = ∑ i = 1 xi(yi  λi) logL =
∑i =1 n
[λi + yilnλi  lnyi!]
=  ∑i = 1 eβ' x i +
n
n
n
i
i
= 0 which is quite similar to the counterpart for the classical regression if we view (yi  λi) = (yi  E[yi*xi]) as a residual. The second derivatives matrix is ∂lnL/∂β∂β′ =  ∑i = 1 ( eβ' x i )xi xi ' = n
∑
n
λxx' i =1 i i i
. This is a
negative definite matrix. To prove this, note, first, that λi must always be positive. Then, let Ω be a diagonal matrix whose ith diagonal element is λ i and let Z = ΩX. Then, ∂lnL/∂β∂β′ = Z′Z which is clearly negative definite. This implies that the loglikelihood function is globally concave and finding its maximum using NewtonNs method will be straightforward and reliable. The iteration for NewtonNs method is defined in (517). We may apply it directly in this problem. The computations involved in using Newton's method to maximize lnL will be as follows: (1) Obtain starting values for the parameters. Because the loglikelihood function is globally concave, it will usually not matter what values are used. Most applications simply use zero. One suggestion which does appear in the literature is β0 =
[∑
] [∑
−1 n qxx ' i =1 i i i
n qx y i =1 i i i
] where q = log(max(1,y )). i
i
185
∧ ∧ ⎡ (2) The iteration is computed as β t + 1 = β t + ⎢ ⎣
∧ ⎤ λ i x i xi '⎥ i =1 ⎦
∑
n
−1
⎡ ⎢⎣
∧ ⎤ xi ( yi − λ i ) ⎥ . i =1 ⎦
∑
n
∧
(3) Each time we compute β t + 1 , we should check for convergence. Some possibilities are (a) Gradient: Are the elements of ∂lnL/∂β small? ∧
∧
(b) Change: Is β t + 1  β t small? (c) Function rate of change: Check the size of ⎡ δt = ⎢ ⎣
∧ ⎤ ⎡ xi ( yi − λ i ) ⎥ ′ ⎢ i =1 ⎦ ⎣
∑
n
∧ ⎤ λ i x i x i '⎥ i =1 ⎦
∑
n
−1
⎡ ⎢⎣
∧ ⎤ xi ( yi − λ i ) ⎥ i =1 ⎦
∑
n
∧
before computing β t + 1 . This measure describes what will happen to the function at the next value of β. This is Belsley's criterion. (4) When convergence has been achieved, the asymptotic covariance matrix for the estimates is estimated with the inverse matrix used in the iterations. Using the data given in the problem, the results of the above computations are Iter. α β lnL ∂lnL/∂α ∂lnL/∂β Change 0 0 0 102.387 65. 95.1 296.261 1 1.37105 2.17816 1442.38 1636.25 2788.5 1526.36 2 .619874 2.05865 461.989 581.966 996.711 516.92 3 .210347 1.77914 141.022 195.953 399.751 197.652 4 .351893 1.26291 51.2989 57.9294 102.847 30.616 5 .824956 .698768 33.5530 12.8702 23.1932 2.75855 6 1.05288 .453352 32.0824 1.28785 2.29289 .032399 7 1.07777 .425239 32.0660 .016067 .028454 .0000051 8 1.07808 .424890 32.0660 0 0 0 At the final values, the negative inverse of the second derivatives matrix is −1 ⎡ .151044 −.095961⎤ ⎤ ⎡ n ∧ λ x x ' ⎢ i =1 i i i ⎥ = ⎢ −.095961 .0664665⎥ . ⎦ ⎣ ⎣ ⎦
∑
4. Use Monte Carlo Integration to plot the function g(r) = E[xr*x>0] for the standard normal distribution. The expected value from the truncated normal distribution is E[ x  x > 0] = r
∫
∞
0
x
r
∫ f ( x x > 0)dx =
∞
x r φ( x )dx
0
∫
∞
φ( x )dx
=
2 π
∫
∞
r
x e
−
x2 2 dx.
0
0
To evaluate this expectation, we first sampled 1,000 observations from the truncated standard normal distribution using (51). For the standard normal distribution, μ = 0, σ = 1, PL = Φ((0  0)/1) = 2, and PU = Φ((+4  0)/1) = 1. Therefore, the draws are obtained by transforming draws from U(0,1) (denoted Fi) to xi = Φ[2(1 + Fi)]. Since 0 < Fi < 1, the argument in brackets must be greater than 2, so xi > 0, which is to be expected. Using the same 1,000 draws each time (so as to obtain smoothness in the figure), we then plot the 1 1000 values of x r = ∑ xir , r = 0, .2, .4,.6, ..., 5.0. As an additional experiment, we generated a second 1000 i = 1 sample of 1,000 by drawing observations from the standard normal distribution and discarding them and redrawing if they were not positive. The means and standard deviations of the two samples were (0.8097,0.6170) for the first and (0.8059,0.6170) for the second. Drawing the second sample takes approximately twice as long as the second. Why?
186
5. For the model in Example 5.10, derive the LM statistic for the test of the hypothesis that μ=0. The derivatives of the loglikelihood with μ = 0 imposed are gμ =
∑
nx / σ 2 and
n
x2 −n i =1 i gσ2 = + . The estimator for σ2 will be obtained by equating the second of these to 0, which 2σ 2 2σ 4 will give (of course), v = x′x/n. The terms in the Hessian are Hμμ = n/σ2, H μσ 2 = − nx / σ 4 ,
and Hσ2σ2 = n/(2σ4)x′x/σ6. At the MLE, gσ 2 = 0, exactly. The off diagonal term in the expected Hessian is 1
⎡n ⎤ 2 0 ⎥ ⎡nx / v ⎤ ⎡ x ⎤ ⎢v = ⎢ also zero. Therefore, the LM statistic is LM = nx / v 0 ⎢ ⎥ . n ⎥ ⎢ 0 ⎥ v/ n⎦ ⎣ ⎦ ⎣ ⎢0 ⎥ 2v 2 ⎦ ⎣ This resembles the square of the standard tratio for testing the hypothesis that μ = 0. It would be exactly that save for the absence of a degrees of freedom correction in v. However, since we have not estimated μ with x in fact, LM is exactly the square of a standard normal variate divided by a chisquared variate over its degrees of freedom. Thus, in this model, LM is exactly an F statistic with 1 degree of freedom in the numerator and n degrees of freedom in the denominator.
[
]
6. In Example 5.10, what is the concentrated over μ log likelihood function? It is obvious that whatever solution is obtained for σ2, the MLE for μ will be x , so the concentrated n −n 1 2 log 2 π + log σ 2 − xi − x loglikelihood function is log Lc = 2 i = 1 2 2σ 7. In Example E.13, suppose that E[yi] = μ, for a nonzero mean. (a) Extend the model to include this new parameter. What are the new log likelihood, likelihood equation, Hessian, and expected Hessian? (b) How are the iterations carried out to estimate the full set of parameters? (c) Show how the LIMDEP program should be modified to include estimation of μ. (d) Using the same data set, estimate the full set of parameters.
(
)
∑ (
)
187
If yi has a nonzero mean, μ, then the loglikelihood is lnL(γ,μZ) = −
n 1 log(2 π) − 2 2
n 1 = − log(2 π) − 2 2
n
∑ log σ
2 i
−
i =1
n
∑ i =1
1 zi ' γ − 2
1 2
⎛ ( yi − μ ) 2 ⎞ ⎟ σ i2 ⎠ i =1 n
∑ ⎜⎝
n
∑ i =1
( yi − μ ) 2 exp( − zi ' γ ).
The likelihood equations are
∂ ln L = ∂γ
1 2
n
∑ i =1
⎛ ( y − μ) 2 ⎞ 1 − 1⎟ = − zi ⎜ i 2 2 ⎝ σi ⎠
n
∑ i =1
zi +
1 2
n
∑ i =1
( yi  μ ) 2 zi exp( − zi ' γ )
= gγ(γ,μ) = 0 and
∂ ln L ∂μ
n
=
∑ i =1
( yi − μ ) exp( − zi ' γ ) = gμ(γ,μ) = 0.
n ⎛ ( y − μ) 2 ⎞ 1 ∂ 2 ln L 1 = − ∑ zi zi ' ⎜ i 2 ⎟ = − ∑ ( yi  μ ) 2 zi zi ' exp( − zi ' γ ) = Hγγ. 2 2 ∂γ∂γ ' ⎝ σi ⎠ i =1 i =1 n
The Hessian is
∂ 2 ln L = − ∂γ∂μ ∂ 2 ln L = ∂μ∂μ
∑
−
n z ( yi i =1 i
∑
n i =1
− μ ) exp( −z i ' γ ) = Hγμ
exp( −zi ' γ ) = Hμμ
The expectations in the Hessian are found as follows: Since E[yi] = μ, E[Hγμ] = 0. There are no stochastic n 1 terms in Hμμ, so E[Hμμ] = Hμμ = − . Finally, E[(yi  μ)2] = σi2, so E[Hγγ] = 1/2(Z′Z). 2
∑
i =1 σ
i
There is more than one way to estimate the parameters. As in Example 5.13, the method of scoring (using the expected Hessian) will be straightforward in principle  though in our example, it does not work well in practice, so we use Newton’s method instead. The iteration, in which we use index ‘t’ to indicate the estimate at iteration t, will be ⎡μ ⎤ ⎡μ ⎤ 1 ⎢ γ ⎥ (t+1) = ⎢ γ ⎥ (t)  E[H(t)] g(t). ⎣ ⎦ ⎣ ⎦ If we insert the expected Hessians and first derivatives in this iteration, we obtain −1 n y − μ(t ) ⎤ i 1 ⎡ n ⎤ ⎡ 0 ⎢ 2 ⎥ i 1 = ⎢ i =1 σ 2 (t ) ⎥ σ i (t ) ⎡μ ⎤ ⎡μ ⎤ ⎢ ⎥. i = + ⎢ ⎥ (t+1) (t) 2 ⎢γ ⎥ ⎢γ ⎥ ⎛ ( yi − μ (t )) ⎞⎥ n ⎢ 1 1 ⎣ ⎦ ⎣ ⎦ ⎢ z⎜ − 1⎟ ⎥ 0 Z' Z ⎥ ⎢ i =1 i ⎝ ⎢⎣ ⎥⎦ 2 2 σ i2 (t ) ⎠⎦ ⎣
∑
∑
∑
The zero off diagonal elements in the expected Hessian make this convenient, as the iteration may be broken into two parts. We take the iteration for μ first. With current estimates μ(t) and γ(t), the method of n y − μ(t ) i i =1 σ 2 (t ) i scoring produces this iteration: μ(t+1) = μ(t) + . As will be explored in Chapters 12 and n 1 i =1 σ 2 (t ) i
∑
∑
13, this is generalized least squares. Let i denote an n×1 vector of ones, let ei(t) = yi  μ(t) denote the ‘residual’ at iteration t and let e(t) denote the n×1 vector of residuals. Let Ω(t) denote a diagonal matrix which has σi2 on its diagonal (and zeros elsewhere). Then, the iteration for μ is μ(t+1) = μ(t) + [i′Ω(t)1i]1[i′Ω(t)1e(t)]. This shows how to compute μ(t+1). The iteration for γ(t+1) is exactly as was shown in Example 5.13, save for the single change that in the computation, yi2 is changed to (yi  μ(t))2. Otherwise, the computation is identical. Thus, we would have γ(t+1) = γ(t) + (Z′Z)1Z′v(γ(t),μ(t)), where vi(γ(t),μ(t)) is the term in parentheses in the iteration shown above. This shows how to compute γ(t+1).
188
/*================================================================ Program Code for Estimation of Harvey's Model The data set for this model is 100 observations from Greene (1992) Variables are: Y = Average monthly credit card expenditure Q1 = Age in years+ 12ths of a year Q2 = Income, divided by 10,000 Q3 = OwnRent; individual owns (1) or rents (0) home Q4 = Self employed (1=yes, 0=no) Read ; Nobs = 200 ; Nvar = 6 ; Names = y,q1,q2,q3,q4 ; file=d:\DataSets\A51.dat$ Namelist ; Z = One,q1,q2,q3,q4 $ ================================================================ Step 1 is to get the starting values and set some values for the iterations iter=iteration counter, delta=value for convergence. */ Create ; y0 = y – Xbr(y) ; ui = log(y0^2) $ Matrix ; gamma0 = * Z'ui ; EH = 2* $ Calc ; c0 = gamma0(1)+1.2704 ? Correction to start value ; s20 = y0'y0/n ; delta = 1 ; iter=0 $ Create ; vi0 = y0^2 / s20  1 $ (Used in LM statistic) ? Correct first element in gamma, then set starting vector. Matrix ; Gamma0(1) = c0 ; Gamma = Gamma0 $ Start value for gamma Calc ; mu0 = Xbr(y); mu = mu0$ Start value for mu Procedure [This does the iterations]Create ; vari = exp(Z'Gamma) ; ei = ymu ; varinv=1/vari ; hi = ei^2 / vari ; gigamma = .5*(hi  1); gimu = ei/vari ; logli = .5*(log(2*pi) + log(vari) + hi) $ Matrix ; ggamma = Z'gigamma ; gmu= 1’gimu ; H = 2* ; gupdate = H*ggamma ? scoring, update = EH*ggamma ; Gamma = Gamma + gupdate $ Calc ; muupdate = Sum(gimu)/Sum(varinv) ; mu = mu + muupdate $ Matrix ; update = [gupdate/muupdate] ; g = [ggamma/gmu] $ Calc ; list ; Iter = Iter+1 ; LogLU = Sum(logli);delta=g'update$ EndProcedure Execute ; While delta > .00001 $ Matrix ; Stat (Gamma,H) $ Calc ; list ; mu ; vmu = 1/Sum(varinv) ; tmu = mu/Sqr(Vmu) $ Calc ; list ; Sigmasq = Exp(Gamma(1)) ; K = Col(Z) ; SE = Sigmasq * Sqr(H(1,1)) ; TRSE = Sigmasq/SE ; LogLR = n/2*(1 + log(2*pi)+ log(s20)) ; LRTest = 2*(LogLR  LogLU) $ Matrix ; Alpha = Gamma(2:K) ; VAlpha = Part(H,2,K,2,K) ; list ; WaldTest = Alpha ' Alpha ; LMTest = .5* vi0'Z * * Z'vi0 ; EH ; H ; VB = BHHH(Z,gi) ; $
189
In the Example in the text, μ was constrained to equal y . In the program, μ is allowed to be a free parameter. The comparison of the two sets of results appears below. (Unconstrained model) (Constrained model, μ = y ) Iteration log likelihood δ logl;ikelihood δ 1 2 3 4 5 6
698.3888 692.2986 689.7029 689.4980 689.4741 689.47407
19.7022 4.5494 0.406881 0.01148798 0.0000125995 0.000000000016
Estimated Paramaters Variable Estimate Std Error Age 0.013042 0.02310 Income 0.6432 0.120001 Ownrent 0.2159 0.3073 SelfEmployed 0.4273 0.6677 γ1 8.465 σ2 4,745.92 μ 189.02 fixed Tests of the joint hypothesis that LW 40.716 Wald: 39.024 LM 35.115
692.2987 683.2320 680.7028 679,7461 679.4856 679.4856 679.4648 679.4568 679.4542 679.4533 679.4530 679.4529 679.4528 679.4528 679.4528
22.8406 6.9005 2.7494 0.63453 0.27023 0.08124 0.03079 0.0101793 0.00364255 0.001240906 0.00043431 0.0001494193 0.00005188501 0.00001790973 0.00000620193
tratio 0.565 5.360 0.703 0.640
0.0134 0.0244 0.550 0.9953 0.1375 7.236 0.0774 0.3004 0.258 1.3117 0.6719 1.952 7.867 2609.72 91.874 15.247 6.026 all slope coefficients are zero: 60.759 69.515 35.115 (same by construction).
190