Эконометрика — Совбак ВШЭ и РЭШ, 2022 midterm
Question 1
True, false, or maybe true — 20 points
Indicate whether each statement is true, maybe true, or false, with a brief explanation.
(a) (4 points) If the correlation between and is zero, then the slope coefficient from a regression of on is also zero.
(b) (4 points) The slope coefficient from a regression of on is the same as the slope coefficient from a regression of on .
For parts (c)-(e), suppose
and all standard OLS assumptions hold.
(c) (4 points) The estimator
is unbiased.
(d) (4 points) In large samples, inference may be based on
having a normal distribution.
(e) (4 points) The expression
is a good large-sample approximation to the sampling variance of .
Question 2
Short answers — 40 points
(a) (8 points) Suppose
but you do not know this and regress only on . Is the slope estimator unbiased for ?
(b) (8 points) Independent random variables and have variances 9 and 25. You wish to estimate the difference between their means as precisely as possible and can collect at most 200 observations in total. How many observations should be collected for and how many for ?
(c) (8 points) Show that sample residuals are uncorrelated with fitted values:
where
(d) (8 points) A regression of student test score on a dummy for whether the student's parents have higher education, using 200 students, gives a highly significant coefficient with , but . Is this possible?
(e) (8 points) Suppose
Let be the slope from a regression of on alone. Is a consistent estimator of ? Suggest a method for consistently estimating using only simple regressions, each with one regressor and a constant.
Question 3
Grants, maternal literacy, and test scores — 16 points
A regression uses data for 2,384 students in rural Kenya. The regressors are:
- : a dummy for whether the school received a cash grant that year;
- : a dummy for whether the student's mother is literate.
Grant money was used to improve school quality by building and repairing classrooms and purchasing textbooks, desks, blackboards, and other equipment. Test scores range from 0 to 100.
The estimated regression is
with standard errors
(a) (4 points) Construct a 95% confidence interval for the effect of receiving a grant. Do students at schools receiving grants perform significantly better than students at schools that do not?
(b) (4 points) The grants in the observed year were approximately \2.50$10$ per student?
(c) (4 points) What is the estimated test-score difference between a student whose mother is literate and a student whose mother cannot read or write? Suppose the \10$ could instead be spent on educating the mother, with a 70% probability of making her literate. What is the estimated effect of that policy on test scores?
(d) (4 points) What proportion of variation in test scores is explained by the two regressors? Is this a large or small amount?
Question 4
Computer use and the wage structure — 24 points
In Alan Krueger's paper How Computers Have Changed the Wage Structure (Quarterly Journal of Economics, 1993), computer use at work is used as a proxy for computer skills. The data come from the 1984 and 1989 U.S. Current Population Surveys.
Table I reports the percentage of workers in different groups who directly use a computer at work. Table II reports OLS regressions of log hourly wages on computer use and other controls. Standard errors are in parentheses.
Table I. Percentage using a computer at work
| Group | 1984 | 1989 |
|---|---|---|
| All workers | 24.6 | 37.4 |
| Gender | ||
| Men | 21.2 | 32.3 |
| Women | 29.0 | 43.4 |
| Education | ||
| Less than high school | 5.0 | 7.8 |
| High school | 19.3 | 29.3 |
| Some college | 30.6 | 45.3 |
| College | 41.6 | 58.2 |
| Postcollege | 42.8 | 59.7 |
| Race | ||
| White | 25.3 | 38.5 |
| Black | 19.4 | 27.7 |
| Age | ||
| 18-24 | 19.7 | 29.4 |
| 25-39 | 29.2 | 41.5 |
| 40-54 | 23.6 | 39.1 |
| 55-65 | 16.9 | 26.3 |
| Occupation | ||
| Blue-collar | 7.1 | 11.6 |
| White-collar | 33.0 | 48.4 |
| Union status | ||
| Union member | 20.2 | 32.5 |
| Nonunion | 28.0 | 41.1 |
| Hours | ||
| Part-time | 23.7 | 36.3 |
| Full-time | 28.9 | 42.7 |
| Region | ||
| Northeast | 25.5 | 38.0 |
| Midwest | 23.4 | 36.0 |
| South | 23.2 | 36.5 |
| West | 27.0 | 39.9 |
Sample sizes are 61,712 in 1984 and 62,748 in 1989.
Table II. OLS estimates of the effect of computer use on pay
Dependent variable: .
| Regressor | 1984 (1) | 1984 (2) | 1984 (3) | 1989 (4) | 1989 (5) | 1989 (6) |
|---|---|---|---|---|---|---|
| Intercept | 1.937<br>(0.005) | 0.750<br>(0.023) | 0.928<br>(0.026) | 2.086<br>(0.006) | 0.905<br>(0.024) | 1.094<br>(0.026) |
| Uses computer at work | 0.276<br>(0.010) | 0.170<br>(0.008) | 0.140<br>(0.008) | 0.325<br>(0.009) | 0.188<br>(0.008) | 0.162<br>(0.008) |
| Years of education | — | 0.069<br>(0.001) | 0.048<br>(0.002) | — | 0.075<br>(0.002) | 0.055<br>(0.002) |
| Experience | — | 0.027<br>(0.001) | 0.025<br>(0.001) | — | 0.027<br>(0.001) | 0.025<br>(0.001) |
| Experience squared | — | -0.041<br>(0.002) | -0.040<br>(0.002) | — | -0.041<br>(0.002) | -0.040<br>(0.002) |
| Black | — | -0.098<br>(0.013) | -0.066<br>(0.012) | — | -0.121<br>(0.013) | -0.092<br>(0.012) |
| Other race | — | -0.105<br>(0.020) | -0.079<br>(0.019) | — | -0.029<br>(0.020) | -0.015<br>(0.020) |
| Part-time | — | -0.256<br>(0.010) | -0.216<br>(0.010) | — | -0.221<br>(0.010) | -0.183<br>(0.010) |
| Lives in SMSA | — | 0.111<br>(0.007) | 0.105<br>(0.007) | — | 0.138<br>(0.007) | 0.130<br>(0.007) |
| Veteran | — | 0.038<br>(0.011) | 0.041<br>(0.011) | — | 0.025<br>(0.012) | 0.031<br>(0.011) |
| Female | — | -0.162<br>(0.012) | -0.135<br>(0.012) | — | -0.172<br>(0.012) | -0.151<br>(0.012) |
| Married | — | 0.156<br>(0.011) | 0.129<br>(0.011) | — | 0.159<br>(0.012) | 0.143<br>(0.011) |
| Married Female | — | -0.168<br>(0.015) | -0.151<br>(0.015) | — | -0.141<br>(0.015) | -0.131<br>(0.015) |
| Union member | — | 0.181<br>(0.009) | 0.194<br>(0.009) | — | 0.182<br>(0.010) | 0.189<br>(0.010) |
| Eight occupation dummies | No | No | Yes | No | No | Yes |
| 0.051 | 0.446 | 0.491 | 0.082 | 0.451 | 0.486 |
Columns (2), (3), (5), and (6) also include three regional dummies. Sample sizes are 13,335 in 1984 and 13,379 in 1989.
(a) (5 points) Column (1) regresses wages on computer use alone. What is the earnings advantage of computer users in 1984? How does it change in 1989, using column (4)?
(b) (5 points) Does the simple regression imply a causal effect of computer use on wages? Why are additional regressors included in columns (2) and (3)?
(c) (9 points) Does Table I help explain the change in the computer-use coefficient across specifications? Is it surprising that the coefficient falls after adding controls?
(d) (5 points) Columns (4) and (6) show a significant earnings advantage from computer use even after adding many controls. Is this evidence of a causal effect, or is there still room for doubt?