Эконометрика — Совбак ВШЭ и РЭШ, 2021 midterm

Совбак ВШЭ и РЭШЭконометрика2021midterm
Скачать задачи PDF

Question 1

Part 1 — 25 points

Suppose (Yi,X1i,X2i)(Y_i,X_{1i},X_{2i}) satisfy the four least-squares assumptions for causal inference in the multiple-regression model discussed in class. In addition,

Var(uiX1i,X2i)=4,Var(X1i)=6.\operatorname{Var}(u_i\mid X_{1i},X_{2i})=4, \qquad \operatorname{Var}(X_{1i})=6.

A random sample of size n=400n=400 is drawn from the population.

(a) (9 points) Assume that X1X_1 and X2X_2 are uncorrelated. Compute the asymptotic unconditional variance of β^1\widehat\beta_1.

(b) (9 points) Assume that

corr(X1,X2)=0.5.\operatorname{corr}(X_1,X_2)=0.5.

Compute the asymptotic unconditional variance of β^1\widehat\beta_1.

(c) (7 points) Comment on the following statements:

When X1X_1 and X2X_2 are correlated, the variance of β^1\widehat\beta_1 is larger than it would be if X1X_1 and X2X_2 were uncorrelated. Thus, if you are interested in β1\beta_1, it is best to leave X2X_2 out of the regression if it is correlated with X1X_1.

Question 2

Part 2 — 30 points

Table 2 reports regressions estimated using data on employees in a developing country. The data set contains information on more than 10,000 full-time, full-year workers. Each worker's highest educational qualification is either a high-school diploma or a bachelor's degree. Ages range from 25 to 40. The data also contain region, gender, and age.

Table 1. Variable definitions

NameDescription
AWEAWELogarithm of average weekly earnings, in 2007 units
High SchoolBinary variable equal to 1 for a high-school graduate and 0 for a college graduate
MaleBinary variable equal to 1 for male and 0 for female
AgeAge in years
North1 if region is North, 0 otherwise
East1 if region is East, 0 otherwise
South1 if region is South, 0 otherwise
West1 if region is West, 0 otherwise

Table 2. Regressions of log average weekly earnings

Standard errors are in parentheses.

Regressor(1)(2)(3)
High-school graduate, X1X_10.352<br>(0.021)0.373<br>(0.021)0.371<br>(0.021)
Male, X2X_20.458<br>(0.021)0.457<br>(0.020)0.451<br>(0.020)
Age, X3X_30.011<br>(0.001)0.011<br>(0.001)
North, X4X_40.175<br>(0.037)
South, X5X_50.103<br>(0.033)
East, X7X_7-0.102<br>(0.043)
Intercept12.840<br>(0.018)12.471<br>(0.049)12.390<br>(0.057)
FF statistic for zero regional effects21.87
SER1.0261.0231.020
R2R^20.07100.07610.0814
nn10,97310,97310,973

(a) (2 points) For every coefficient in all three regressions, add ^* for significance at the 5% level and ^{**} for significance at the 1% level.

Using regression (1):

(b) (3 points) Is the estimated high-school earnings difference statistically significant at the 5% level? Construct a 95% confidence interval.

(c) (3 points) Is the estimated male-female earnings difference statistically significant at the 5% level? Construct a 95% confidence interval.

Using regression (2):

(d) (3 points) Is age an important determinant of earnings? Use an appropriate test and/or confidence interval.

(e) (3 points) Alvo is a 30-year-old male college graduate, and Kal is a 40-year-old male college graduate. Construct a 95% confidence interval for the expected difference between their earnings.

Using regression (3):

(f) (3 points) Are there important regional differences? Use an appropriate hypothesis test.

(g)

Juan is a 32-year-old male high-school graduate from the North. Mel is a 32-year-old male college graduate from the North. Ari is a 32-year-old male college graduate from the East.

  1. (3 points) Construct a 95% confidence interval for the difference in expected earnings between Juan and Mel.
  2. (4 points) Explain how you would construct a 95% confidence interval for the difference in expected earnings between Juan and Ari.

(h) (3 points) Regression (2) was re-estimated using 5,000 randomly selected observations from 1993, with earnings converted into 2007 units using the Consumer Price Index:

logAWE^=9.32+0.301HighSchool+0.562Male+0.011Age,\widehat{\log AWE} =9.32+0.301\,HighSchool+0.562\,Male+0.011\,Age,

with standard errors

(0.20)(0.019)(0.047)(0.002),(0.20)\qquad(0.019)\qquad(0.047)\qquad(0.002),

and

SER=1.25,Rˉ2=0.08.SER=1.25, \qquad \bar R^2=0.08.

Compared with the 2007 regression in column (2), was there a statistically significant change in the coefficient on High School?

(i) (3 points) In all regressions above, the coefficient on High School is positive, large, and statistically significant. Does this provide strong evidence of high returns to schooling in the labor market? Explain.

Question 3

Part 3 — 20 points

A researcher collects data on houses sold in one neighborhood during the past year and obtains the following regressions.

Table 3. Regressions of house price on house characteristics

Dependent variable: ln(Price)\ln(Price). Standard errors are in parentheses.

Regressor(1)(2)(3)(4)(5)
Size0.00042<br>(0.000038)
ln(Size)\ln(Size)0.69<br>(0.054)0.68<br>(0.087)0.57<br>(2.03)0.69<br>(0.055)
[ln(Size)]2[\ln(Size)]^20.0078<br>(0.14)
Bedrooms0.0036<br>(0.037)
Pool0.082<br>(0.032)0.071<br>(0.034)0.071<br>(0.034)0.071<br>(0.036)0.071<br>(0.035)
View0.037<br>(0.029)0.027<br>(0.028)0.026<br>(0.026)0.027<br>(0.029)0.027<br>(0.030)
Pool ×\times View0.0022<br>(0.10)
Condition0.13<br>(0.045)0.12<br>(0.035)0.12<br>(0.035)0.12<br>(0.036)0.12<br>(0.035)
Intercept10.97<br>(0.069)6.60<br>(0.39)6.63<br>(0.53)7.02<br>(7.50)6.60<br>(0.40)
SER0.10261.0231.020
R2R^20.07100.07610.0814

Definitions: PricePrice is sale price in dollars; SizeSize is square feet; Bedrooms is number of bedrooms; Pool is 1 if the house has a swimming pool; View is 1 if the house has a nice view; Condition is 1 if the real-estate agent reports that the house is in excellent condition.

(a) (3 points) Using column (1), what is the expected change in price from building a 1,500-square-foot addition? Construct a 99% confidence interval for the percentage change in price.

(b) (3 points) How is the coefficient on ln(Size)\ln(Size) interpreted in column (2)? What is the effect of doubling house size on price?

(c) (3 points) Using column (2), estimate the effect of a view on price. Construct a 99% confidence interval. Is the effect statistically different from zero?

(d) (3 points) Using column (3), calculate the effect of adding two bedrooms. Is the effect statistically significant? Which variable, size or number of bedrooms, appears more important for determining price?

(e) (3 points) Is the coefficient on Condition significant in column (4)?

(f) (5 points) Is the interaction between Pool and View significant in column (5)? Find the effect of adding a view for a house with a pool and for a house without a pool.

Question 4

Part 4 — 25 points

Demand for a commodity is

Q=β0+β1P+u,Q=\beta_0+\beta_1P+u,

where QQ is log quantity, PP is log price, and uu collects other demand determinants.

Supply is

Q=γ0+γ1P+v,Q=\gamma_0+\gamma_1P+v,

where vv collects other supply determinants. Assume

E(u)=E(v)=0,Var(u)=σu2,Var(v)=σv2,Cov(u,v)=0.E(u)=E(v)=0, \qquad \operatorname{Var}(u)=\sigma_u^2, \qquad \operatorname{Var}(v)=\sigma_v^2, \qquad \operatorname{Cov}(u,v)=0.

(a) (5 points) Solve the simultaneous equations to show how QQ and PP depend on uu and vv.

(b) (5 points) Derive the means of PP and QQ.

(c) (5 points) Derive Var(P)\operatorname{Var}(P), Var(Q)\operatorname{Var}(Q), and Cov(Q,P)\operatorname{Cov}(Q,P).

(d)

A large random sample of (Qi,Pi)(Q_i,P_i) is collected, and QiQ_i is regressed on PiP_i.

  1. (5 points) Use your answers to parts (b) and (c) to derive the population regression coefficients.
  2. (5 points) A researcher treats the slope as an estimate of the demand slope β1\beta_1. Is the estimated slope too large or too small?