Эконометрика — Совбак ВШЭ и РЭШ, 2022 midterm

Совбак ВШЭ и РЭШЭконометрика2022midterm
Скачать задачи PDF

Question 1

True, false, or maybe true — 20 points

Indicate whether each statement is true, maybe true, or false, with a brief explanation.

(a) (4 points) If the correlation between XX and YY is zero, then the slope coefficient from a regression of YY on XX is also zero.

(b) (4 points) The slope coefficient from a regression of yi+cy_i+c on xi+cx_i+c is the same as the slope coefficient from a regression of yiy_i on xix_i.

For parts (c)-(e), suppose

E(yixi)=α+βxi,E(y_i\mid x_i)=\alpha+\beta x_i,

and all standard OLS assumptions hold.

(c) (4 points) The estimator

β^=i=1n(xixˉ)yii=1n(xixˉ)2\widehat\beta= \frac{\sum_{i=1}^n(x_i-\bar x)y_i} {\sum_{i=1}^n(x_i-\bar x)^2}

is unbiased.

(d) (4 points) In large samples, inference may be based on

β^βs.e.(β^)\frac{\widehat\beta-\beta}{\operatorname{s.e.}(\widehat\beta)}

having a normal distribution.

(e) (4 points) The expression

1ni=1nei2i=1n(xixˉ)2,ei=yiα^β^xi,\frac{\frac1n\sum_{i=1}^n e_i^2} {\sum_{i=1}^n(x_i-\bar x)^2}, \qquad e_i=y_i-\widehat\alpha-\widehat\beta x_i,

is a good large-sample approximation to the sampling variance of β^\widehat\beta.

Question 2

Short answers — 40 points

(a) (8 points) Suppose

E(yixi)=β0+β1xi+β2xi2,E(y_i\mid x_i)=\beta_0+\beta_1x_i+\beta_2x_i^2,

but you do not know this and regress yiy_i only on xix_i. Is the slope estimator unbiased for β1\beta_1?

(b) (8 points) Independent random variables XX and YY have variances 9 and 25. You wish to estimate the difference between their means as precisely as possible and can collect at most 200 observations in total. How many observations should be collected for XX and how many for YY?

(c) (8 points) Show that sample residuals are uncorrelated with fitted values:

i=1neiy^i=0,\sum_{i=1}^n e_i\widehat y_i=0,

where

y^i=α^+β^xi,ei=yiy^i.\widehat y_i=\widehat\alpha+\widehat\beta x_i, \qquad e_i=y_i-\widehat y_i.

(d) (8 points) A regression of student test score on a dummy for whether the student's parents have higher education, using 200 students, gives a highly significant coefficient with t=10.3t=10.3, but R2=0.08R^2=0.08. Is this possible?

(e) (8 points) Suppose

E(yix1i,x2i)=β0+β1x1i+β2x2i.E(y_i\mid x_{1i},x_{2i})=\beta_0+\beta_1x_{1i}+\beta_2x_{2i}.

Let β~1\widetilde\beta_1 be the slope from a regression of yiy_i on x1ix_{1i} alone. Is β~1\widetilde\beta_1 a consistent estimator of β1\beta_1? Suggest a method for consistently estimating β1\beta_1 using only simple regressions, each with one regressor and a constant.

Question 3

Grants, maternal literacy, and test scores — 16 points

A regression uses data for 2,384 students in rural Kenya. The regressors are:

  • grantsgrants: a dummy for whether the school received a cash grant that year;
  • mom_litmom\_lit: a dummy for whether the student's mother is literate.

Grant money was used to improve school quality by building and repairing classrooms and purchasing textbooks, desks, blackboards, and other equipment. Test scores range from 0 to 100.

The estimated regression is

test_score^=48.4+1.20grants+2.67mom_lit,R2=0.008,\widehat{test\_score} =48.4+1.20\,grants+2.67\,mom\_lit, \qquad R^2=0.008,

with standard errors

(0.6)(0.66)(0.72).(0.6)\qquad(0.66)\qquad(0.72).

(a) (4 points) Construct a 95% confidence interval for the effect of receiving a grant. Do students at schools receiving grants perform significantly better than students at schools that do not?

(b) (4 points) The grants in the observed year were approximately \2.50perstudent.Whatisthepredictedeffectontestscoresofgrantsworthper student. What is the predicted effect on test scores of grants worth$10$ per student?

(c) (4 points) What is the estimated test-score difference between a student whose mother is literate and a student whose mother cannot read or write? Suppose the \10$ could instead be spent on educating the mother, with a 70% probability of making her literate. What is the estimated effect of that policy on test scores?

(d) (4 points) What proportion of variation in test scores is explained by the two regressors? Is this a large or small amount?

Question 4

Computer use and the wage structure — 24 points

In Alan Krueger's paper How Computers Have Changed the Wage Structure (Quarterly Journal of Economics, 1993), computer use at work is used as a proxy for computer skills. The data come from the 1984 and 1989 U.S. Current Population Surveys.

Table I reports the percentage of workers in different groups who directly use a computer at work. Table II reports OLS regressions of log hourly wages on computer use and other controls. Standard errors are in parentheses.

Table I. Percentage using a computer at work

Group19841989
All workers24.637.4
Gender
Men21.232.3
Women29.043.4
Education
Less than high school5.07.8
High school19.329.3
Some college30.645.3
College41.658.2
Postcollege42.859.7
Race
White25.338.5
Black19.427.7
Age
18-2419.729.4
25-3929.241.5
40-5423.639.1
55-6516.926.3
Occupation
Blue-collar7.111.6
White-collar33.048.4
Union status
Union member20.232.5
Nonunion28.041.1
Hours
Part-time23.736.3
Full-time28.942.7
Region
Northeast25.538.0
Midwest23.436.0
South23.236.5
West27.039.9

Sample sizes are 61,712 in 1984 and 62,748 in 1989.

Table II. OLS estimates of the effect of computer use on pay

Dependent variable: ln(hourly wage)\ln(hourly\ wage).

Regressor1984 (1)1984 (2)1984 (3)1989 (4)1989 (5)1989 (6)
Intercept1.937<br>(0.005)0.750<br>(0.023)0.928<br>(0.026)2.086<br>(0.006)0.905<br>(0.024)1.094<br>(0.026)
Uses computer at work0.276<br>(0.010)0.170<br>(0.008)0.140<br>(0.008)0.325<br>(0.009)0.188<br>(0.008)0.162<br>(0.008)
Years of education0.069<br>(0.001)0.048<br>(0.002)0.075<br>(0.002)0.055<br>(0.002)
Experience0.027<br>(0.001)0.025<br>(0.001)0.027<br>(0.001)0.025<br>(0.001)
Experience squared /100/100-0.041<br>(0.002)-0.040<br>(0.002)-0.041<br>(0.002)-0.040<br>(0.002)
Black-0.098<br>(0.013)-0.066<br>(0.012)-0.121<br>(0.013)-0.092<br>(0.012)
Other race-0.105<br>(0.020)-0.079<br>(0.019)-0.029<br>(0.020)-0.015<br>(0.020)
Part-time-0.256<br>(0.010)-0.216<br>(0.010)-0.221<br>(0.010)-0.183<br>(0.010)
Lives in SMSA0.111<br>(0.007)0.105<br>(0.007)0.138<br>(0.007)0.130<br>(0.007)
Veteran0.038<br>(0.011)0.041<br>(0.011)0.025<br>(0.012)0.031<br>(0.011)
Female-0.162<br>(0.012)-0.135<br>(0.012)-0.172<br>(0.012)-0.151<br>(0.012)
Married0.156<br>(0.011)0.129<br>(0.011)0.159<br>(0.012)0.143<br>(0.011)
Married ×\times Female-0.168<br>(0.015)-0.151<br>(0.015)-0.141<br>(0.015)-0.131<br>(0.015)
Union member0.181<br>(0.009)0.194<br>(0.009)0.182<br>(0.010)0.189<br>(0.010)
Eight occupation dummiesNoNoYesNoNoYes
R2R^20.0510.4460.4910.0820.4510.486

Columns (2), (3), (5), and (6) also include three regional dummies. Sample sizes are 13,335 in 1984 and 13,379 in 1989.

(a) (5 points) Column (1) regresses wages on computer use alone. What is the earnings advantage of computer users in 1984? How does it change in 1989, using column (4)?

(b) (5 points) Does the simple regression imply a causal effect of computer use on wages? Why are additional regressors included in columns (2) and (3)?

(c) (9 points) Does Table I help explain the change in the computer-use coefficient across specifications? Is it surprising that the coefficient falls after adding controls?

(d) (5 points) Columns (4) and (6) show a significant earnings advantage from computer use even after adding many controls. Is this evidence of a causal effect, or is there still room for doubt?