Эконометрика — Совбак ВШЭ и РЭШ, 2021 midterm
Question 1
Part 1 — 25 points
Suppose satisfy the four least-squares assumptions for causal inference in the multiple-regression model discussed in class. In addition,
A random sample of size is drawn from the population.
(a) (9 points) Assume that and are uncorrelated. Compute the asymptotic unconditional variance of .
(b) (9 points) Assume that
Compute the asymptotic unconditional variance of .
(c) (7 points) Comment on the following statements:
When and are correlated, the variance of is larger than it would be if and were uncorrelated. Thus, if you are interested in , it is best to leave out of the regression if it is correlated with .
Question 2
Part 2 — 30 points
Table 2 reports regressions estimated using data on employees in a developing country. The data set contains information on more than 10,000 full-time, full-year workers. Each worker's highest educational qualification is either a high-school diploma or a bachelor's degree. Ages range from 25 to 40. The data also contain region, gender, and age.
Table 1. Variable definitions
| Name | Description |
|---|---|
| Logarithm of average weekly earnings, in 2007 units | |
| High School | Binary variable equal to 1 for a high-school graduate and 0 for a college graduate |
| Male | Binary variable equal to 1 for male and 0 for female |
| Age | Age in years |
| North | 1 if region is North, 0 otherwise |
| East | 1 if region is East, 0 otherwise |
| South | 1 if region is South, 0 otherwise |
| West | 1 if region is West, 0 otherwise |
Table 2. Regressions of log average weekly earnings
Standard errors are in parentheses.
| Regressor | (1) | (2) | (3) |
|---|---|---|---|
| High-school graduate, | 0.352<br>(0.021) | 0.373<br>(0.021) | 0.371<br>(0.021) |
| Male, | 0.458<br>(0.021) | 0.457<br>(0.020) | 0.451<br>(0.020) |
| Age, | — | 0.011<br>(0.001) | 0.011<br>(0.001) |
| North, | — | — | 0.175<br>(0.037) |
| South, | — | — | 0.103<br>(0.033) |
| East, | — | — | -0.102<br>(0.043) |
| Intercept | 12.840<br>(0.018) | 12.471<br>(0.049) | 12.390<br>(0.057) |
| statistic for zero regional effects | — | — | 21.87 |
| SER | 1.026 | 1.023 | 1.020 |
| 0.0710 | 0.0761 | 0.0814 | |
| 10,973 | 10,973 | 10,973 |
(a) (2 points) For every coefficient in all three regressions, add for significance at the 5% level and for significance at the 1% level.
Using regression (1):
(b) (3 points) Is the estimated high-school earnings difference statistically significant at the 5% level? Construct a 95% confidence interval.
(c) (3 points) Is the estimated male-female earnings difference statistically significant at the 5% level? Construct a 95% confidence interval.
Using regression (2):
(d) (3 points) Is age an important determinant of earnings? Use an appropriate test and/or confidence interval.
(e) (3 points) Alvo is a 30-year-old male college graduate, and Kal is a 40-year-old male college graduate. Construct a 95% confidence interval for the expected difference between their earnings.
Using regression (3):
(f) (3 points) Are there important regional differences? Use an appropriate hypothesis test.
(g)
Juan is a 32-year-old male high-school graduate from the North. Mel is a 32-year-old male college graduate from the North. Ari is a 32-year-old male college graduate from the East.
- (3 points) Construct a 95% confidence interval for the difference in expected earnings between Juan and Mel.
- (4 points) Explain how you would construct a 95% confidence interval for the difference in expected earnings between Juan and Ari.
(h) (3 points) Regression (2) was re-estimated using 5,000 randomly selected observations from 1993, with earnings converted into 2007 units using the Consumer Price Index:
with standard errors
and
Compared with the 2007 regression in column (2), was there a statistically significant change in the coefficient on High School?
(i) (3 points) In all regressions above, the coefficient on High School is positive, large, and statistically significant. Does this provide strong evidence of high returns to schooling in the labor market? Explain.
Question 3
Part 3 — 20 points
A researcher collects data on houses sold in one neighborhood during the past year and obtains the following regressions.
Table 3. Regressions of house price on house characteristics
Dependent variable: . Standard errors are in parentheses.
| Regressor | (1) | (2) | (3) | (4) | (5) |
|---|---|---|---|---|---|
| Size | 0.00042<br>(0.000038) | — | — | — | — |
| — | 0.69<br>(0.054) | 0.68<br>(0.087) | 0.57<br>(2.03) | 0.69<br>(0.055) | |
| — | — | — | 0.0078<br>(0.14) | — | |
| Bedrooms | — | — | 0.0036<br>(0.037) | — | — |
| Pool | 0.082<br>(0.032) | 0.071<br>(0.034) | 0.071<br>(0.034) | 0.071<br>(0.036) | 0.071<br>(0.035) |
| View | 0.037<br>(0.029) | 0.027<br>(0.028) | 0.026<br>(0.026) | 0.027<br>(0.029) | 0.027<br>(0.030) |
| Pool View | — | — | — | — | 0.0022<br>(0.10) |
| Condition | 0.13<br>(0.045) | 0.12<br>(0.035) | 0.12<br>(0.035) | 0.12<br>(0.036) | 0.12<br>(0.035) |
| Intercept | 10.97<br>(0.069) | 6.60<br>(0.39) | 6.63<br>(0.53) | 7.02<br>(7.50) | 6.60<br>(0.40) |
| SER | 0.1026 | 1.023 | — | — | 1.020 |
| 0.0710 | 0.0761 | — | — | 0.0814 |
Definitions: is sale price in dollars; is square feet; Bedrooms is number of bedrooms; Pool is 1 if the house has a swimming pool; View is 1 if the house has a nice view; Condition is 1 if the real-estate agent reports that the house is in excellent condition.
(a) (3 points) Using column (1), what is the expected change in price from building a 1,500-square-foot addition? Construct a 99% confidence interval for the percentage change in price.
(b) (3 points) How is the coefficient on interpreted in column (2)? What is the effect of doubling house size on price?
(c) (3 points) Using column (2), estimate the effect of a view on price. Construct a 99% confidence interval. Is the effect statistically different from zero?
(d) (3 points) Using column (3), calculate the effect of adding two bedrooms. Is the effect statistically significant? Which variable, size or number of bedrooms, appears more important for determining price?
(e) (3 points) Is the coefficient on Condition significant in column (4)?
(f) (5 points) Is the interaction between Pool and View significant in column (5)? Find the effect of adding a view for a house with a pool and for a house without a pool.
Question 4
Part 4 — 25 points
Demand for a commodity is
where is log quantity, is log price, and collects other demand determinants.
Supply is
where collects other supply determinants. Assume
(a) (5 points) Solve the simultaneous equations to show how and depend on and .
(b) (5 points) Derive the means of and .
(c) (5 points) Derive , , and .
(d)
A large random sample of is collected, and is regressed on .
- (5 points) Use your answers to parts (b) and (c) to derive the population regression coefficients.
- (5 points) A researcher treats the slope as an estimate of the demand slope . Is the estimated slope too large or too small?