Эконометрика — Совбак ВШЭ и РЭШ, 2020 final

Совбак ВШЭ и РЭШЭконометрика2020final
Скачать задачи PDF

Question 1

Part I: Female labor supply — 21 points

Harvard economist Claudia Goldin attributes much of the rise of professional women in the U.S. labor force to their ability to engage in family planning after the introduction of the birth-control pill. In developing countries, early childbearing is associated with lower education and greater dependence on husbands' earnings.

This part studies the effect of family size on female labor supply using n=254,654n=254{,}654 married women aged 21-35 from the 1980 U.S. Census. The data refer to calendar year 1979.

Table 1. Variables

VariableDefinition
Wife's weeks workedNumber of weeks the wife worked for pay in 1979
Husband's weeks workedNumber of weeks the husband worked for pay in 1979
Same sex1 if the first two children have the same sex, 0 otherwise
2 boys1 if the first two children are boys, 0 otherwise
2 girls1 if the first two children are girls, 0 otherwise
Kids >2>21 if the family has more than two children, 0 otherwise
Boy first1 if the first child is a boy, 0 otherwise
Current age of motherMother's age in 1979
Age of mother at first birthMother's age when her first child was born
Black1 if Black, 0 otherwise
Hispanic1 if Hispanic, 0 otherwise
Other race1 if nonwhite, non-Black, and non-Hispanic, 0 otherwise

Table 2. Child sex composition, family size, and labor supply

Robust standard errors are in parentheses. All regressions include an intercept, not reported. ^{**} denotes 1% significance and ^* denotes 5% significance.

Regressor/statistic(1) OLS: Kids >2>2(2) OLS: Kids >2>2(3) OLS: wife's weeks(4) TSLS: wife's weeks(5) TSLS: wife's weeks(6) TSLS: husband's weeks
Instrument(s)Same sex2 boys, 2 girlsSame sex2 boys, 2 girlsSame sex
Same sex0.0694**<br>(0.0018)
2 boys0.0599**<br>(0.0026)
2 girls0.0789**<br>(0.0026)
Kids >2>2-8.04**<br>(0.09)-5.40**<br>(1.21)-5.16**<br>(1.20)1.01<br>(0.63)
Boy first-0.0011<br>(0.0019)-0.0015<br>(0.0026)-0.05<br>(0.08)-0.02<br>(0.08)-0.02<br>(0.08)0.03<br>(0.08)
Current age of mother0.0304**<br>(0.0003)0.0304**<br>(0.0003)1.33**<br>(0.01)1.25**<br>(0.04)1.25**<br>(0.04)0.10*<br>(0.04)
Age at first birth-0.0436**<br>(0.0003)-0.0436**<br>(0.0003)-1.36**<br>(0.17)-1.24**<br>(0.05)-1.24**<br>(0.05)-0.21**<br>(0.06)
Black0.0680**<br>(0.0042)0.0680**<br>(0.0042)10.83**<br>(0.19)10.66**<br>(0.21)10.64**<br>(0.21)-4.10**<br>(0.26)
Hispanic0.1260**<br>(0.0039)0.1260**<br>(0.0039)-0.04<br>(0.18)-0.38<br>(0.23)-0.41<br>(0.23)2.61**<br>(0.23)
Other race0.0480**<br>(0.0044)0.0480**<br>(0.0044)2.82**<br>(0.20)2.70**<br>(0.21)2.69**<br>(0.21)2.02**<br>(0.18)
NN254,654254,654254,654254,654254,654254,654
First-stage FF statistic1413.0725.9
JJ statistic3.24

1 — 3 points

Give the best reason why the OLS estimator of the coefficient on Kids >2>2 in column (3) may be biased.

2 — 3 points

Consider the hypothesis that, on average, U.S. parents want children of both sexes. Does Table 2 provide evidence for this hypothesis, against it, or neither? Explain.

3 — 6 points

Consider each proposed instrument for Kids >2>2 in regression (3). Is it arguably valid? Explain.

  1. (3 points) Whether the wife came from a large family.
  2. (3 points) The teenage-pregnancy rate in the wife's city or town.

4 — 6 points

Using judgment and the empirical results in Table 2:

  1. (3 points) Is Same sex a valid instrument in regression (4)?
  2. (3 points) Are 2 boys and 2 girls a valid instrument set in regression (5)?

5 — 3 points

The estimated coefficient on Kids >2>2 is more negative in OLS regression (3) than in TSLS regression (4). Give a real-world interpretation that could explain this difference.

Question 2

Part II: Female labor supply, continued — 19 points

Consider the hypothetical regression

WifeWeeksi=β0+β1(Kids>2)i+ui,(7)WifeWeeks_i=\beta_0+\beta_1(Kids>2)_i+u_i, \tag{7}

estimated by TSLS using Same sex as the instrument. For this question, assume Same sex is valid in regression (4) and is independent of every control in regression (4), so that, for example,

E(BoyFirstSameSex)=E(BoyFirst),E(BoyFirst\mid SameSex)=E(BoyFirst),

and analogously for the other controls.

1 — 7 points

  1. (3.5 points) Explain why Same sex is a valid instrument in regression (7).
  2. (3.5 points) Despite its validity in regression (7), why might regression (4) still be preferable?

2 — 4 points

Suppose the labor-supply effect of having a large family differs across women: the more professionally ambitious a woman is, the smaller the effect, and the most ambitious women work whether or not they have a large family. How does this affect interpretation of regressions (4) and (5)?

Use Table 2 to assess each statement.

3 — 4 points

Families with many children may be unusual because of religious or ethnic background. Therefore, regressions (4) and (5) do not estimate the effect of family size on labor supply; they only capture religious or ethnic effects. Agree or disagree, with a specific explanation.

4 — 4 points

Even if large families reduce female labor-force participation, husbands work more to compensate for the loss of the wife's earnings. Agree or disagree using Table 2.

Question 3

Part III: Public smoking bans and smoking habits — 19 points

Do smoking bans in bars reduce smoking? The data are a panel of 50 U.S. states observed from 2001 through 2009, for 50×9=45050\times9=450 state-year observations.

Table 3. Variable definitions and summary statistics

VariableDefinitionMeanStd. dev.
smokingrateFraction of adults who currently smoke0.2420.044
statebarban1 if a bar-smoking ban is in effect0.2020.402
staterestban1 if a restaurant-smoking ban is in effect0.2480.422
stateworkban1 if a workplace-smoking ban is in effect0.1820.375
all3bans1 if bans apply in bars, restaurants, and workplaces0.1290.335
drinkingrateFraction of adults who drink0.5960.098
somehsFraction with less than a high-school diploma0.0680.028
hsgradFraction with a high-school diploma and no further education0.2690.046
somecollegeFraction with some college but no college degree0.2870.035
collegegradFraction with a college degree0.3760.073
whiteFraction white0.7550.143
blackFraction Black0.0980.098
HispanicFraction Hispanic0.0810.092
otherFraction neither white, Black, nor Hispanic0.0660.078
yeary000
Number of states with a bar-smoking ban and average smoking rate, 2001-2009

Table 4. Smoking rates and public smoking bans

Dependent variable: smokingrate. Standard errors are in parentheses. Regressions (1)-(2) use 2009 only; regressions (3)-(6) use all years. Regressions (3)-(6) include state fixed effects. Regressions (4)-(6) also include year fixed effects. Standard errors are heteroskedasticity-robust in (1)-(2) and clustered by state in (3)-(6).

Regressor/statistic(1)(2)(3)(4)(5)(6)
statebarban-0.0494**<br>(0.0097)-0.0306**<br>(0.0077)-0.0187**<br>(0.0045)-0.0120**<br>(0.0033)-0.0133**<br>(0.0036)-0.0028<br>(0.0139)
statebarban ×\times drinkingrate-0.0147<br>(0.0233)
staterestban-0.0003<br>(0.0044)0.0034<br>(0.0040)0.0040<br>(0.0042)0.0039<br>(0.0038)
stateworkban-0.0075*<br>(0.0029)-0.0032<br>(0.0030)-0.0041<br>(0.0039)-0.0035<br>(0.0030)
all3bans0.0018<br>(0.0038)
drinkingrate0.229**<br>(0.052)0.015<br>(0.036)0.014<br>(0.036)0.018<br>(0.038)
somehs-0.693**<br>(0.236)0.209<br>(0.127)0.256**<br>(0.092)0.256**<br>(0.092)0.256**<br>(0.092)
somecollege-0.926**<br>(0.209)0.005<br>(0.119)-0.046<br>(0.079)-0.046<br>(0.079)-0.047<br>(0.080)
collegegrad-0.642**<br>(0.111)-0.374**<br>(0.067)-0.204**<br>(0.049)-0.203**<br>(0.050)-0.204**<br>(0.050)
black-0.027<br>(0.045)-0.029<br>(0.037)-0.028<br>(0.037)-0.028<br>(0.037)
Hispanic-0.193**<br>(0.044)-0.207**<br>(0.030)-0.208**<br>(0.030)-0.208**<br>(0.030)
other0.272**<br>(0.087)0.169*<br>(0.070)0.169*<br>(0.070)0.166*<br>(0.071)
NN5050450450450450
FF: statebarban and interaction7.05, p=0.002p=0.002
FF: education variables12.32, p=0.000p=0.00032.06, p=0.000p=0.00024.88, p=0.000p=0.00024.97, p=0.000p=0.00024.49, p=0.000p=0.000
FF: race variables10.63, p=0.000p=0.00023.73, p=0.000p=0.00023.11, p=0.000p=0.00023.96, p=0.000p=0.000

1

Using regression (2):

  1. (2 points) Interpret the coefficient on statebarban.
  2. (2 points) Construct a 95% confidence interval for the population coefficient.

2 — 2.5 points

Give a reason why the statebarban coefficient changes between regressions (1) and (2), including the direction of the change.

3 — 2.5 points

Give a reason why the statebarban coefficient changes between regressions (3) and (4), including the direction of the change.

4

Using regression (4):

  1. (2 points) Test at the 5% level whether all coefficients on educational-achievement variables are zero.
  2. (2 points) Are the estimated education-related differences in smoking rates large or small in a real-world sense?

5 — 2 points

Regression (5) includes all3bans, which equals the product of statebarban, staterestban, and stateworkban. Does regression (5) suffer from perfect multicollinearity? Explain.

6

Using regression (6):

  1. (2 points) Compute the predicted effect of a bar-smoking ban when drinkingrate is 0.70.
  2. (2 points) Explain precisely how to construct a 95% confidence interval for that predicted effect. A numerical interval is not required.

Question 4

Part IV: Miscellaneous questions — 41 points

1 — 5 points

Consider

Yit=β1Xit+αi+λit+uit,Y_{it}=\beta_1X_{it}+\alpha_i+\lambda_it+u_{it},

where t=1,,Tt=1,\ldots,T, i=1,,ni=1,\ldots,n, and αi+λit\alpha_i+\lambda_it is an unobserved entity-specific time trend. How would you estimate β1\beta_1?

2. Linear probability model — 8 points

Consider

Yi=β0+β1Xi+ui,E(uiXi)=0.Y_i=\beta_0+\beta_1X_i+u_i, \qquad E(u_i\mid X_i)=0.
  1. (1 point) Show that P(Yi=1Xi)=β0+β1XiP(Y_i=1\mid X_i)=\beta_0+\beta_1X_i.
  2. (2 points) Show that Var(uiXi)=(β0+β1Xi)[1(β0+β1Xi)].\operatorname{Var}(u_i\mid X_i) =(\beta_0+\beta_1X_i)\bigl[1-(\beta_0+\beta_1X_i)\bigr].
  3. (1 point) Is uiu_i heteroskedastic? Explain.
  4. (4 points) Derive the likelihood function.

3. Two instruments — 5 points

A model has one endogenous regressor XiX_i and two instruments Z1iZ_{1i} and Z2iZ_{2i}. There is a strong theoretical reason for Z1iZ_{1i} to be exogenous because it results from a random lottery, but Z1iZ_{1i} alone is weak. Instrument Z2iZ_{2i} is strongly relevant but less likely to be exogenous. With both instruments, the overidentification statistic is

J=7.5.J=7.5.
  1. (2.5 points) Does this suggest E(uiZ1i,Z2i)0E(u_i\mid Z_{1i},Z_{2i})\neq0? Explain.
  2. (2.5 points) Does it suggest E(uiZ2i)0E(u_i\mid Z_{2i})\neq0? Explain.

4. Omitted controls in IV — 5 points

One student estimates

Yi=β0+β1Xi+β2Wi+uiY_i=\beta_0+\beta_1X_i+\beta_2W_i+u_i

using ZiZ_i as an instrument. Another estimates the same relationship but omits WiW_i.

  1. (2.5 points) The first student says that if ZiZ_i and WiW_i are correlated, the second student's IV estimator is inconsistent. Is this correct?
  2. (2.5 points) The second student says that if the true β2=0\beta_2=0, the IV estimator remains consistent. Is this correct?

5. Forecasting an AR(1) — 5 points

Consider the stationary model

Yt=β0+β1Yt1+ut,Y_t=\beta_0+\beta_1Y_{t-1}+u_t,

where utu_t is i.i.d. with mean zero and variance σu2\sigma_u^2. Using observations t=1,,Tt=1,\ldots,T, the forecast is

Y^T+1T=β^0+β^1YT.\widehat Y_{T+1\mid T}=\widehat\beta_0+\widehat\beta_1Y_T.
  1. (1 point) Show that YT+1Y^T+1T=uT+1[(β^0β0)+(β^1β1)YT].Y_{T+1}-\widehat Y_{T+1\mid T} =u_{T+1}-\bigl[(\widehat\beta_0-\beta_0)+(\widehat\beta_1-\beta_1)Y_T\bigr].
  2. (1 point) Show that uT+1u_{T+1} is independent of YTY_T.
  3. (1 point) Show that uT+1u_{T+1} is independent of β^0\widehat\beta_0 and β^1\widehat\beta_1.
  4. (2 points) Show that Var(YT+1Y^T+1T)=σu2+Var[(β^0β0)+(β^1β1)YT].\operatorname{Var}(Y_{T+1}-\widehat Y_{T+1\mid T}) =\sigma_u^2+\operatorname{Var}\bigl[(\widehat\beta_0-\beta_0)+(\widehat\beta_1-\beta_1)Y_T\bigr].

6. Random walk — 6 points

Suppose

Yt=Yt1+ut,Y0=0,Y_t=Y_{t-1}+u_t, \qquad Y_0=0,

where utu_t is i.i.d. with mean zero and variance σu2\sigma_u^2.

  1. (2 points) Compute the mean and variance of YtY_t.
  2. (2 points) Compute Cov(Yt,Ytk)\operatorname{Cov}(Y_t,Y_{t-k}).
  3. (2 points) Use the results to show that YtY_t is nonstationary.

7. OLS with serially correlated regressors and errors — 7 points

Consider

Yt=β0+β1Xt+ut,Y_t=\beta_0+\beta_1X_t+u_t,

where

ut=ϕ1ut1+u~t,ϕ1<1,u_t=\phi_1u_{t-1}+\widetilde u_t, \qquad |\phi_1|<1,

and

Xt=γ1Xt1+et,γ1<1.X_t=\gamma_1X_{t-1}+e_t, \qquad |\gamma_1|<1.

The innovations u~t\widetilde u_t and ete_t are i.i.d. with variances σu~2\sigma_{\widetilde u}^2 and σe2\sigma_e^2, and ete_t is independent of u~t+i\widetilde u_{t+i} for all tt and ii.

  1. (1 point) Show that Var(ut)=σu~21ϕ12,Var(Xt)=σe21γ12.\operatorname{Var}(u_t)=\frac{\sigma_{\widetilde u}^2}{1-\phi_1^2}, \qquad \operatorname{Var}(X_t)=\frac{\sigma_e^2}{1-\gamma_1^2}.
  2. (1 point) Show that Cov(ut,utj)=ϕ1jVar(ut),Cov(Xt,Xtj)=γ1jVar(Xt).\operatorname{Cov}(u_t,u_{t-j})=\phi_1^j\operatorname{Var}(u_t), \qquad \operatorname{Cov}(X_t,X_{t-j})=\gamma_1^j\operatorname{Var}(X_t).
  3. (1 point) Show that the corresponding correlations are ϕ1j\phi_1^j and γ1j\gamma_1^j.
  4. (4 points) Find the asymptotic variance of β^1OLS\widehat\beta^{OLS}_1.